专利摘要:
APPARATUS AND METHOD FOR CREATING AN ENCODED SIGNAL OR FOR DECODING AN ENCODED AUDIO SIGNAL USING A MULTIPLE OVERLAY PART. An apparatus for creating an encoded signal, comprising: a window sequence controller (808) for creating window sequence information (809) for managing windows of an audio or image signal, indicating the window sequence information a first window (1500) to create a first spectral value frame, a second window function (1502) and at least a third window function (1503) to create a second spectral value frame, where the first window function (1500 ), the second window function (1502) and the third window function (s) overlap within a multiple overlap zone (1300); a preprocessor (802) for managing windows (902) of a second sample block corresponding to the second window function and at least a third window function using an auxiliary window function (1100) to obtain a second window sample block , and to pre-process (904) the second block of window samples using a folding operation of a part of the second block that overlaps with a first block on the multiple overlapping part (1300) to obtain (...).
公开号:BR112015019270B1
申请号:R112015019270-0
申请日:2014-02-20
公开日:2021-02-17
发明作者:Christian Helmrich;Jérémie Lecomte;Goran Markovic;Markus Schnell;Bernd Edler;Stefan REUSCHL
申请人:Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.;
IPC主号:
专利说明:

[0001] [001] The present invention relates to the processing of audio or image signals and, in particular, to the encoding or decoding of audio or image signals in the presence of transients.
[0002] [002] Contemporary frequency domain voice / audio coding schemes based on overlapping FFTs or the modified discrete cosine transformation (MDCT) offer some degree of adaptation to the non-stationary signal characteristics. Generalized codecs standardized in MPEG, namely MPEG-1 Layer 3, better known as MP3, MPEG-4 (HE-) AAC [1], and more recently, MPEG-0 xHE-AAC (USAC), as well as the codec Opus / Celt specified by the IETF [2], allow the encoding of a frame using one of at least two different transformation lengths - a long transformation of length M for stationary signal passages, or 8 short transformations of length M / 8 each. In the case of MPEG codecs, changing from long to short transformations and from short to long transformations (also known as block change) requires the use of transition transformations with asymmetric windows, namely a start window and a stop window, respectively. These transformation forms, together with other known forms of the prior art, are shown in Figure 16. Note that the linear overlapping inclination is merely illustrative and varies in exact form. The AAC standard [1] and section 6 of [3] provide possible window shapes.
[0003] [003] Bearing in mind that the incoming frame must be encoded with short transformations by an MPEG encoder, the current frame must be encoded with an initial transition transformation, it is evident that an encoder implemented in accordance with one of the MPEG standards above mentioned requires at least one anticipation frame length. In low-delay communication applications, however, it is desirable to minimize or even avoid this additional anticipation. To this end, two modifications to the generalist coding paradigm have been proposed. One, which was adopted by ex. in Celt [2] it is to reduce the overlap of the long transformation to that of the short transformation, in order to avoid asymmetric transition windows. The other modification, which is used eg. in MPEG-4 Low Delay (Enhanced) AAC encoding schemes, it is not to allow the shift to shorter transformations and instead rely on a Time Noise Modeling (TNS) encoding tool [4] that operates on the long transformation coefficients to minimize the temporal propagation of coding errors around the transients.
[0004] [004] In addition, like xHE-AAC, the Short Delay AAC allows the use of two frame overlap widths - the predefined 50% overlap for the stationary input or a reduced overlap (similar to the short overlap of transforms of transition) for non-stationary signals. The reduced overlap effectively limits the time span of a transformation and, therefore, its coding error in the case of the coefficient quantization.
[0005] [005] US patent 2008 / 0140428A 1 issued to Samsung Electronics Co., as well as US patents 5502789 and 5819214 issued to Sony Corp., feature adaptive window size or signal transformation units . However, the transformer units controlled by these window size or transformation units operate on QMF or LOT subband values (implying that the systems described employ both filter banks or cascading transformations), while in the present case directly over the time domain full band input signal. In addition, document 2008 / 0140428A 1 does not describe details about the shape or control of the window overlay, and in document 5819214 the overlapping shapes follow - that is, they are the result of - the production of the size determination unit. transformation, which is the opposite of what is proposed by a privileged model of the present invention.
[0006] [006] The US patent 2010 / 0076754A1 issued to France Telecom follows the same motivation as the present invention, namely to be able to change the transformation length in communication coding scenarios to improve the coding of segments of the transient signal, and do it without extra encoder anticipation. However, while that document reveals that the goal of low delay is achieved by avoiding transition windows of the transformation length and later processing the reconstructed signal in the decoder (disadvantageously through the amplification of parts of the decoded signal and, therefore, the coding error ), the present invention proposes a simple modification of the transition window of a prior art system that will be introduced below, in order to minimize the additional anticipation of the encoder and to avoid the post-processing of the special (risky) decoder.
[0007] [007] The transitional transformation, to which an inventive modification must be applied, is the initial window described in two variants in US patent 5848391 attributed to Fraunhofer-Gesellschaft eV and Dolby Laboratories Licensing Corp., as well as, in a slightly different, in US patent 2006 / 0122825A 1 issued to Samsung Electronics Co. Figure 16 shows these initial windows and reveals that the difference between the Fraunhofer / Dolby windows and the Samsung window is the presence of a non-segment overlap, that is, in an area of the window that has a constant maximum value that does not belong to any overlap slope. Fraunhofer / Dolby windows display a "non-overlapping piece with a length" that Samsung windows do not have. It is concluded that an encoder can be made with the least amount of additional anticipation but using the transformation transformation of the previous technique through the Samsung transition window approach. With these transformations, just an anticipation equal to the overlap width between the short transformations is enough to change completely from the long to the short transformations long enough before a signal transient.
[0008] [008] In addition, the prior art can be found in document WO 90/09063 or "Coding of audio signals with transform functions of the overlap block and adaptive window functions", frequency, volume 43, September 1989, page 2052 to 2056 or in the AES Convention Document 4929, “MPEG-4 Short-Term Audio Coding based on the AAC Codec”, E. Allamanche, et al., 106 Convention, 1999.
[0009] [009] However, depending on the length of the short transformation, the anticipation can remain quite large and should not be avoided. Figure 17 illustrates the performance of the block change during the worst entry situation, namely the presence of a sudden transient at the beginning of the anticipated zone, which in turn begins at the end of the long slope, that is, the zone of overlap between the frames. According to the prior art approaches, at least one of the two transients presented arrives at the transition transformation. In a lossy encoding system that uses an encoder without additional anticipation - an encoder that does not "see the incoming transient" - this condition causes the encoding error to propagate over time until the start of the long slope and, even when using TNS, pre-echo noise is likely to be heard on the decoded signal.
[0010] [010] The two anticipation alternatives previously mentioned have their disadvantages. Reducing the long transformation overlap by a factor of up to 8, on the one hand, as done in the Celt encoder, severely limits the efficiency (that is, the coding gain, the spectral compression) on the stationary input material, above all highly tonal . On the other hand, prohibiting short transformations as in Short-Term (Enhanced) AAC, reduces codec performance in strong transients with durations much shorter than the frame length, often leading to audible pre-echo or post-echo noise even when using TNS .
[0011] [011] Thus, the procedures for determining the prior art window sequence are under-optimized for flexibility due to restricted window lengths, are under-optimized for the required delay due to the required minimum transient anticipation periods, under-optimized for audio quality due to pre-echoes and post-echoes, are under-optimized for efficiency due to additional potentially necessary pre-processing using additional features in addition to window system procedures with certain windows or are under -optimized for flexibility and efficiency due to the potential need to change a frame / block pattern in the presence of a transient.
[0012] [012] An object of the present invention is to provide an improved audio coding / decoding concept that provides better performance in relation to at least one of the disadvantages of the prior art.
[0013] [013] This objective is achieved by an apparatus for encoding an audio or image signal of claim 1, an apparatus for decoding an audio or image signal of claim 17, a method of encoding an audio or signal the image of claim 32, a method of decoding an audio or image signal of claim 33 or a computer program of claim 34.
[0014] [014] Aspects of the present invention are based on the discovery that in order for a short-delayed image or audio codec to address the coding quality of generalist codecs, it is useful to maintain a high overlap percentage between long transformations during stationary signal inputs and allow immediate switching to overlaps and shorter transformations in parts of the audio or image signal around non-stationary signal. In addition, it is desirable to allow a little greater flexibility than offering just a binary choice regarding the overlap width and, additionally or alternatively regarding the transformation lengths, so that the overlap width or lengths of the transformation (s) within a frame can be precisely adapted based on the location of a possible transient within the temporal zone of the frame to minimize pre-echoes or other artifacts.
[0015] [015] Specifically, a transient location detector is configured to identify a transient location within a frame's transient anticipation zone and, based on the location of the transient within the frame, a specific window is selected from of a group of at least three windows, where these three windows are different in their overlapping lengths with corresponding adjacent windows. Therefore, the first window has an overlap length greater than the second window, and the second window has an overlap length greater than the overlap length of the third window, and the third window may, alternatively, also have a zero overlap, that is, without overlap. The specific window is selected based on the location of the transient so that one of two time-adjacent overlap windows has a first window coefficient at the location of the transient and the other of the two time-adjacent overlap windows has a second window coefficient. at the location of the transient, where the second coefficients are at least nine times greater than the first coefficients. In this way, it is ensured that the location of the transient is, in relation to the first window, sufficiently suppressed and that the transient is, in relation to the second window, sufficiently captured. In other words, and preferably, the previous window is already at values close to zero at the location of the transient where the transient was detected and the second window has window coefficients close to one in this zone, so that, for at least a part of the transient, the transient is suppressed in the previous window and is not suppressed in the following or next window.
[0016] [016] In an implementation, the overlap lengths are different by integer factors, so that the second overlap length is, for example, equal to half the third overlap length and the third overlap length is equal to half of the second overlap length or is different from the second overlap length by a different factor but greater than or equal to at least 64 samples or greater than or equal to at least 32 samples or greater than or equal to at least 16 samples audio or image.
[0017] [017] The selection of the window derived from the location of the transient is transmitted together with the frames of the audio or image signal, so that a decoder can select the corresponding synthesis windows in line with the selection of the encoder of the analysis windows , ensuring that the encoder and decode are synchronized throughout the encoding / decoding operation.
[0018] [018] In one implementation, a controllable window manager, a converter, a transient location detector and a controller form a device for coding, and the converter applies any of the well-known distortion-introducing transformations, such as an MDCT ( transformation of the modified discrete cosine), a DST (transformation of the modified discrete sine) or any other similar transformation. On the decoder side, a processor cooperates with a controllable converter to convert a sequence of blocks of spectral values into a representation of the time domain using an overlap-addition processing according to the window sequences indicated by a window information received by the decoder.
[0019] [019] Depending on the implementation, a change in the transformation length may be implemented in addition to the selection of the transformation overlay, again based on the location of the transient within the frame. By implementing a multiple overlay section, in which at least three windows overlap each other, a very short delay codec concept is realized, which again substantially reduces the required transient anticipation delay compared to previous concepts. In another implementation, it is preferable to start by selecting the overlay and then deciding the transformation length to determine an overlay code for each frame. Alternatively, the decision to change the transformation length can be made regardless of the overlap width decision and, based on these two decisions, an overlap code is determined. Based on the overlay code for a current frame and the overlay code for a previous frame, the window sequence for a specific transient is selected, on the basis of which an encoder and a decoder operate in sync with each other.
[0020] [020] In another aspect, a window sequence controller, a preprocessor and a spectrum converter together constitute an apparatus for creating an encoded signal, in which three windows have a multiple overlapping part. This multiple overlapping part, in which not only two windows, as in the prior art, but three windows overlap each other, allows for a concept of very little delay due to the fact that the necessary and required delay is further reduced for anticipation of the transient. A corresponding decoder consists of a decoder processor, a time converter and a post-processor. The postprocessor and preprocessor perform additional window system operations, using the same auxiliary window on the encoder side and on the decoder side, in order to achieve an efficient implementation particularly on mobile devices or low cost devices, where a ROM or RAM storage required should be as small as possible.
[0021] [021] The privileged models are based on a specific sequence of windows and on a specific interaction of windows with different lengths, in order to "place" a short-length window in the transient to avoid long pre-echoes or post-echoes. To ensure that the multiple overlay part does not result in audio or image artifacts, the preprocessor on the encoder side performs a window management operation, using the auxiliary window function and a preprocessing operation, using an operation to bend to obtain a modified multiple overlap part, which is then transformed into the spectral domain using a distortion-introducing transformation. On the decoder side, a corresponding post-processor is configured to perform a folding operation after corresponding transformations for the representation of time and, after the unfolding operation, a window management using the auxiliary function of the window and an addition is made final overlap with a preceding block of samples originating from a window operation with a long window.
[0022] [022] In a model where a transformation overlay is selected, higher audio or image quality is obtained.
[0023] [023] Unlike existing coding systems, which employ only a binary choice of the width of the transformation overlay (large / maximum or small), the model proposes a set of three overlapping widths, from which an encoder can choose one base per frame (or optionally, one per transformation): maximum overlap, half overlap or minimum overlap. The maximum overlap can be equal to the length of the frame, as for sling transformations in AAC, that is, 50% overlap, but it can also be equal to half the length of the frame, ie, 33% overlap, or less, as a preferred model will be described. Correspondingly, the minimum overlap can indicate an overlap width of zero, that is, without overlap, but it can also represent a superposition over zero of a very small number of time samples or milliseconds, as this privileged model will demonstrate . Finally, the half overlap can be, but need not be, half the maximum overlap.
[0024] [024] In particular, according to one aspect of the present invention, a unit for determining the overlap width is defined which selects for each frame (or optionally, for each transformation within a frame) one of the three possible overlap widths. More precisely, this unit for determining the overlap width has, as an input, the output of a transient detection unit to identify with sufficient precision the position of a transient within the current frame (or optionally, within a transformation in the frame) current) and to derive an overlap width so that at least one of the two objectives is achieved:
[0025] [025] - The width is chosen so that only one of the overlapping transformations contains the transient.
[0026] [026] - Pseudo-transients are strongly suppressed due to the TNS modeling of time distortion of the coding error.
[0027] [027] In other words, the overlap width is determined in order to prevent pre-echo or post-echo distortion around a perceptually encoded transient located in the defined frame. Note that a degree of freedom is possible with respect to the means of determining the exact location of the transient. The time index or sub-block that designates a transient location can match the start (start) of that transient location, as in a privileged model, but it can also be the location of the energy or maximum amplitude, or the center of the energy, of the transient.
[0028] [028] Furthermore, unlike the prior art coding schemes, which derive instant inter-transformation overlays from the defined selection of transformation lengths for a pair of frames (ie, the width of the overlap follows the output of a transformation size determination unit), according to another aspect of the present invention, a coding system can, under certain conditions that will be analyzed below in a privileged model, control or derive the length (s) of transformation that will / will be used in a particular frame using the overlap width assigned to that frame and, optionally, the overlap width of the previous frame (that is, the transformation size follows the data from the sizing unit overlapping).
[0029] [029] In another model, in which a multiple overlapping part is used or in which a change in transformation length is applied, a particularly short delay concept is obtained.
[0030] [030] An improvement in the prior art's change schemes is an advantageous modification to the transition transformations of Figure 16, which allows the anticipation of the additional encoder required for stable quality operation to be reduced by half during the non-stationarities of the signal. As discussed above, the start windows proposed by Fraunhofer / Dolby or Samsung are characterized by the presence or absence, respectively, of a "non-overlapping piece with a length". The model goes even further and allows the slopes of the left and right overlap of the transition window to extend into each other. In other words, the modified transition transformation exhibits a "double overlap" zone of nonzero length, in which it overlaps with both the long transformation of the preceding frame and the following short transformation. The shape resulting from the inventive transition transformation is illustrated in Figure 13. Compared to the Samsung transition window shown in Figure 17, it is clear that by allowing a "double overlap" zone in the transformation, the short overlap slope at the end right of the transformation can be shifted to the left by - and thus the required encoder anticipation can be reduced by - half the width of the short transformation overlay. The reduced length of this modified transition window has three crucial advantages, which facilitate implementation, especially on mobile devices:
[0031] [031] The transformation core, that is, the length of the coefficient vector resulting from the elapsed time / frequency transformation (preferably the MDCT), is exactly half the width of the overlap zone between two long transformations. Bearing in mind that this long overlap width is usually equal to the length of the frame or half the length of the frame, this implies that the inventive transition window and the subsequent short windows adapt perfectly to the frame grid and that all transformation sizes of the resulting codec are related by a power factor of two integers, as can be seen in Figure 13.
[0032] [032] - Both the locations of the transient shown in Figure 17 and again in Figure 13 are outside the transition transformation, in order to be able to restrict a temporal persistence of the coding error due to the transients within the span of the first two short windows that follow the transformation. Therefore, unlike the Fraunhofer / Dolby and Samsung schemes of the prior art, audible pre-echo noise around the transients is unlikely to occur when using the block-changing approach of the invention of Figure 13.
[0033] [033] - Both the encoder and the decoder can use exactly the same windows for forward and reverse transformations. In a communication device that performs both encryption and decryption, only a set of window data has to be stored in the ROM. Pre-processing and special post-processing of the signal can also be avoided, which would require an additional ROM and / or RAM program.
[0034] [034] Traditionally, transition windows with a "double overlay" segment, as in the present invention, were not used in the coding of voice or audio or image, most likely because they were considered to violate certain principles that guarantee perfect reconstruction of the waveform in the absence of quantization of the transformation coefficients. However, it is possible to reconstruct exactly the input when using the inventive transition transformation, and in addition, no special post-processing on the decoder side is required, as in France Telecom's proposal.
[0035] [035] Note that it is worth emphasizing that the use of this inventive transition window can be controlled through the unit for determining the inventive overlap width, instead of or as a complement to a unit for determining the transformation length.
[0036] [036] Subsequently, the privileged models of the present invention are discussed and illustrated in more detail. In addition, particular reference is made to the dependent claims that define more models.
[0037] [037] In addition, the specification specifically illustrates an aspect related to the change of adaptive overlap of the location of transients particularly with respect to Figures 1a through 7. Another aspect related to the multiple overlap part is illustrated and described in relation to the Figures 8a to 15f. These individual aspects can be implemented independently of each other, that is, the overlap change can be applied without a multiple overlap zone or the multiple overlap zone can be applied without the adaptive overlap change of the transient location. However, in an implementation, both aspects can be advantageously combined, resulting in a coding / decoding concept with an adaptive overlap change of the transient location and a multiple overlap zone. Such a concept can be further improved by a procedure of changing the transformation length, again dependent on a location of the transient within a transient anticipation zone of a frame. The change in transformation length can be carried out depending on the determination of the overlap width or independent of the overlap change.
[0038] [038] The present invention is not only useful for audio signals, but is also useful for video, photo or image signals in general. For example, in the coding of still images or so-called I frames in AVC or in more or less advanced technologies, the present invention can be applied to avoid blocking artifacts. A transient in the image field would be a sharp edge and a frame would correspond, for example, to a macroblock. The image is then preferably encoded two-dimensionally, using a distortion-introducing transformation and a corresponding spatial overlay. This reduces the blocking artifacts, on the one hand, and reduces any other artifact through transient parts, that is, the parts with sharp edges on the other hand. Therefore, the subsequent presentation also applies to image signals, although it is not specifically indicated throughout the presentation.
[0039] [039] We now discuss the models and aspects with reference to the attached drawings, in which:
[0040] [040] Fig. 1a illustrates an apparatus for coding in the context of an overlapping change aspect;
[0041] [041] Fig. 1b illustrates an apparatus for decoding for the appearance of the overlapping change;
[0042] [042] Fig. 2a illustrates a sequence of windows with total overlap between the posterior windows;
[0043] [043] Fig. 2b illustrates a sequence of windows with half overlap between adjacent windows;
[0044] [044] Fig. 2c illustrates a sequence of windows with one quarter of the overlap between adjacent windows and half overlap between adjacent windows and a total posterior overlap between the adjacent windows;
[0045] [045] Figures 3a and 3c illustrate different overlap widths for different transient locations for a model with a transformation length of 20 m, such as in TCX 20;
[0046] [046] Figures 4a through 4g illustrate a selection of transformation overlap lengths for a transformation length of 10 m, such as TCX 10 dependent on a location of the transient;
[0047] [047] Figures 5a through 5c illustrate an overlap width coding;
[0048] [048] Fig. 6a shows an encoding of the overlap width and transformation length based on the position of the transient;
[0049] [049] Fig. 6b illustrates a decision frame for the transformation length;
[0050] [050] Fig. 7 illustrates different window sequences depending on the previous and current overlap codes;
[0051] [051] Fig. 8a illustrates an encoder in the context of a multiple overlap part in a model of the present invention;
[0052] [052] Fig. 8a illustrates a decoder for the appearance of the multiple overlap part in a model of the present invention;
[0053] [053] Fig. 9a illustrates a procedure according to a privileged model that illustrates the encoder side;
[0054] [054] Fig. 9b illustrates a flow chart of a preferred procedure on the encoder side;
[0055] [055] Fig. 10a illustrates a model of a procedure on the decoder side;
[0056] [056] Fig. 10b illustrates another model of a procedure performed on the decoder side;
[0057] [057] Fig. 11a illustrates operations performed on the encoder side of a model;
[0058] [058] Fig. 11b illustrates operations performed by a decoder on a model of the present invention;
[0059] [059] Figures 12a and 12b illustrate another model of procedures to be performed on the encoder / decoder side in the context of the multiple overlap aspect of the invention;
[0060] [060] Fig. 13 illustrates different window sequences that all have a multiple overlapping part;
[0061] [061] Fig. 14a illustrates a sequence of windows with a changed transformation length depending on the location of the transient;
[0062] [062] Fig. 14b illustrates another sequence of windows with a multiple overlapping part;
[0063] [063] Figures 15a through 15f illustrate different sequences of windows and corresponding parts of anticipation and pre-echoes;
[0064] [064] Fig. 16 illustrates prior art window shapes; and
[0065] [065] Fig. 17 illustrates sequences of prior art windows formed by window shapes of Fig. 16.
[0066] [066] Fig. 1a illustrates an apparatus for encoding an audio signal 100. The apparatus for encoding an audio signal comprises a controllable window manager 102 for managing the windows of the audio signal 100 to provide a sequence of sample blocks. in window at 103. The decoder further comprises a converter 104 for converting the sequence of sample blocks in window 103 into a spectral representation comprising a sequence of spectral value frames indicated at 105. In addition, a detector of the location of the transient 106. The detector is configured to identify a transient location within a frame's transient anticipation zone. In addition, a controller 108 for controlling the controllable window manager is configured to apply a specific window with a specified overlap length for the audio signal 100 in response to an identified location of the transient illustrated in 107. In addition, controller 108 is, in one model, configured to provide information about windows 112 not only for the controllable window manager 102, but also for an output interface 114 that provides, as its output, the encoded audio signal 115. The spectral representation that comprises the sequence of frames of spectral values 105 is introduced in a coding processor 110, which can perform any type of coding operation, such as a prediction operation, a temporal noise modeling operation, a quantization operation preferably with regard to respect to the psycho-acoustic model or at least with respect to the psycho-acoustic principles or can with comprise a deduplication encoding operation, such as the Huffman encoding operation or an arithmetic encoding operation. The performance of the coding processor 110 is then routed to the output interface 114 and the output interface 114 then finally provides the encoded audio signal that has a certain window information 112 associated with each encoded frame.
[0067] [067] Controller 108 is configured to select the specific window from a group of at least three windows. The group comprises a first window with a first length of overlap, a second window with a second length of overlap, and a third window with a third length of overlap or without overlap. The first overlap length is greater than the second overlap length and the second overlap length is greater than a zero overlap. The specific window is selected by the controllable window manager 102 based on the location of the transient so that one of two time-adjacent overlap windows has a first window coefficient at the location of the transient and the other of the two time overlay windows -adjacent has a few window coefficients at the location of the transient, the second coefficients being at least nine times greater than the first coefficients. This ensures that the transient is substantially suppressed by the first window that has the first (small) coefficients and the transient is almost unaffected by the second window that has the second window coefficients. Preferably, the first window coefficients are equal to 1 within a tolerance of plus / minus 5%, such as between 0.95 and 1.05, and the second window coefficients are preferably equal to 0 or at least smaller than 0.05. The window coefficients can be negative, just as, in this case, the relationships and quantities of the window coefficients are related to the absolute magnitude.
[0068] [068] Fig. 2a illustrates a sequence of windows with only the first windows, and the first windows have the first overlap length. In particular, the last frame associated with a first window 200, the current frame associated with window 202 and the third or next frame associated with window 204. In this model, the adjacent windows overlap by 50%, that is, a total length. In addition, frames are placed relative to the windows to identify which part of the audio signal is processed by a frame. This is explained in relation to the current frame. The current frame has a left part 205a and a right part 205b. Correspondingly, the last frame has a right part 204b and a left part 204a. Similarly, the next frame has a left part 206a and a right part 206b. Left / Right refers to earlier or later in time, as shown in Fig. 2a. When the current frame of spectral values is created, the audio samples obtained by managing windows with window 202 are used. The audio samples come from parts 204b to 206a.
[0069] [069] As is known in the MDCT processing technique, generally, in processing that uses a distortion-introducing transformation, this distortion-introducing transformation can be separated into a folding step and a later transformation step that uses a certain transformation of introducing non-distortion. In the example in Fig. 2a, section 204b is folded in section 205a and section 206a is folded in section 205b. The result of the folding operation, that is, the heavy combination of 205a, 204b, on the one hand, and 206a and 205b are then transformed in the spectral domain using a transformation, such as a DCT transformation. In the case of an MDCT, a DCT IV transformation is applied.
[0070] [070] This is further exemplified with reference to the MDCT, but other distortion-introducing transformations can be processed in a similar and analogous way. As a revolved transformation, MDCT is somewhat unusual compared to other Fourier-related transformations in that it has half the outputs of the inputs (instead of the same number). In particular, it is a linear function F: R2N → RN (where R denotes the set of real numbers). The real numbers 2N x0,. .., x2N-1 are transformed into N real numbers X0, ..., XN-1 according to the formula:
[0071] [071] (The normalization coefficient ahead of this transformation, here unity, is an arbitrary convention and differs between treatments. Only the product of the normalization of the MDCT and IMDCT, below, is limited.)
[0072] [072] Reverse transformation
[0073] [073] Inverse MDCT is known as IMDCT. Because there are four different numbers of inputs and outputs, at first glance it may appear that the MDCT should not be invertible. However, perfect invertibility is achieved by adding the overlapping IMDCTs of the time-adjacent overlapping blocks, causing errors to cancel and the original data to be restored; this technique is known as the time domain distortion cancellation (TDAC).
[0074] [074] IMDCT transforms the N real numbers X0, ..., XN-1 into 2N real numbers y0, ..., y2N-1 according to the formula:
[0075] [075] (As for DCT-IV, an orthogonal transformation, the inversion has the same shape as the forward transformation.)
[0076] [076] In the case of a windowed MDCT with the usual window normalization (see below), the normalization coefficient in front of the IMDCT must be multiplied by 2 (ie, become 2 / N).
[0077] [077] In typical signal compression applications, the transformation properties are further improved using a wn window function (n = 0, ..., 2N-1) which is multiplied by xn and yn in the MDCT and IMDCT, above, to avoid discontinuities at the limits n = 0 and 2N making the function go smoothly to zero at these points. (That is, we visualize the data before MDCT and after IMDCT.) In principle, x and y can have different window functions, and the window function can also change from one block to the next (especially in the case where they are combined blocks of data of different sizes), but for simplicity, we consider the common case of identical window functions for blocks of the same size.
[0078] [078] The transformation remains invertible (that is, TDAC works), for a symmetric window wn = w2N-1-n, while w satisfies the Princen-Bradley condition:
[0079] [079] several window functions are used. A window that produces a shape known as modulated reverse transformation [3] [4] is provided by
[0080] [080] and is used for MP3 and MPEG-2 AAC, and
[0081] [081] for Vorbis. AC-3 uses a Kaiser-Bessel derived window (KBD), and MPEG-4 AAC can also use a KBD window.
[0082] [082] Note that the windows applied to MDCT are different from the windows used for some other types of signal analysis, since they have to satisfy the Princen-Bradley condition. One reason for this difference is that MDCT windows are applied twice, both for MDCT (analysis) and for IMDCT (synthesis).
[0083] [083] As can be seen by inspection of the definitions, for the N pair the MDCT is essentially equivalent to a DCT-IV, in which the input is shifted into N / 2 and two N blocks of data are transformed at once . When examining this equivalence more carefully, it is possible to easily derive dominant properties, such as TDAC.
[0084] [084] To define the precise relationship with DCT-IV, it is necessary to realize that DCT-IV corresponds to alternating even / odd limit conditions: even at its left limit (around n = −1 / 2), odd in its right limit (for returning from n = N − 1/2), and so on (instead of periodic limits such as a DFT). This follows identities and. Therefore, if your entries
[0085] [085] Therefore, if your inputs are a network x of length N, we can imagine extending this network to (x, −xR, −x, xR, ...) and so on, where xR denotes x by reverse order.
[0086] [086] Considering an MDCT with 2N inputs and N outputs, in which we divide the inputs into four blocks (a, b, c, d), each with the size N / 2. If we move these to the right in N / 2 (from the term + N / 2 in the definition of MDCT), then (b, c, d) they extend along the end of the N DCT-IV entries, and we have to " fold "back according to the limit conditions described above.
[0087] [087] Thus, the MDCT of 2N inputs (a, b, c, d) is exactly equivalent to a DCT-IV of the inputs N: (−cR − d, a − bR), where R denotes the inversion, as above.
[0088] [088] This is exemplified for the window function 202 in Fig. 2a. a is part 204b, b is part 205a, c is part 205b and d is part 206a.
[0089] [089] (In this way, any algorithm for calculating DCT-IV can be trivially applied to MDCT.)
[0090] [090] Similarly, the above IMDCT formula is precisely 1/2 of DCTIV (which is its own inverse), in which the output extends (through the boundary conditions) to a length of 2N and shifted back to the left in N / 2. The inverse DCT-IV simply returns the entries (−cR − d, a − bR) mentioned above. When this extends and moves through the boundary conditions, you get:
[0091] [091] IMDCT (MDCT (a, b, c, d)) = (a − bR, b − aR, c + dR, d + cR) / 2.
[0092] [092] Half of the IMDCT outputs are therefore redundant, as b − aR = - (a − bR) R, and similarly for the last two terms. If we group the input into larger blocks A, B of size N, where A = (a, b) and B = (c, d), we can write this result in a simple way:
[0093] [093] IMDCT (MDCT (A, B)) = (A − AR, B + BR) / 2
[0094] [094] One can now understand how TDAC works. Suppose that the MDCT of the adjacent time is calculated, 50% overlapping, block 2N (B, C). The IMDCT would result in, as above: (B − BR, C + CR) / 2. When this is added with the previous IMDCT result in the half overlay, the inverted terms cancel and simply obtain B, recovering the data originals.
[0095] [095] The origin of the term "canceling the distortion of the time domain" is now clear. The use of input data that extends beyond the limits of logical DCT-IV causes the data to be distorted in the same way that frequencies beyond the Nyquist frequency are distorted to lower frequencies, except that this distortion occurs in the time domain rather than in the frequency domain: we cannot distinguish the contributions of a and bR in relation to MDCT from (a, b, c, d), or equivalently, in relation to the result of IMDCT (MDCT (a, b, c , d)) = (a − bR, b − aR, c + dR, d + cR) / 2. The c − dR combinations, and so on, have precisely the right signals for the combinations to cancel when they are added.
[0096] [096] For odd N (which in practice are rarely used), N / 2 is not an integer, so the MDCT is not simply a permutation of displacement of a DCT-IV. In this case, the additional displacement in half of a sample means that the MDCT / IMDCT becomes equivalent to DCT-III / II, and the analysis is similar to the one above.
[0097] [097] We saw above that the MDCT of 2N inputs (a, b, c, d) is equivalent to a DCT-IV of the N inputs (−cR − d, a − bR). The DCT-IV is designed if the function on the right limit is odd, so the values near the right limit are close to 0. If the input signal is smooth, this is the case: the components on the right of a and bR are consecutive following the entry (a, b, c, d), and therefore their difference is small. Let's look at the center of the range: if we rewrite the above expression as (−cR − d, a − bR) = (−d, a) - (b, c) R, the second term, (b, c) R, gives a smooth transition in the middle. However, in the first term, (−d, a), there is a potential discontinuity where the right end of −d meets the left end of a. This is the reason for using a window function that reduces components close to the limits of the input sequence (a, b, c, d) towards 0.
[0098] [098] Above, the TDAC property was proven for the ordinary MDCT, showing that the addition of adjacent blocks of time IMDCTs in its overlapping half retrieves the original data. The derivation of this inversion property for the windowed MDCT is only slightly more complicated.
[0099] [099] Consider overlapping consecutive sets of 2N inputs (A, B) and (B, C), for size A, B, C blocks. Remember from above that when (A, B) and (B, C ) are MDCTed, IMDCTed, and added in their overlapping half, we get (B + BR) 2 / + (B - BR) 2 / = B, the original data.
[0100] [100] Now we assume that we multiply both MDCT inputs and IMDCT outputs by a window function of length 2N. As above, we assume a symmetric window function, which is, therefore, the way in which W is an N vector of length and R denotes reversal as before. Then the Princen-Bradley condition can be written as W + WR2R = (1,1, ...), with the squares and additions performed in an elementary way.
[0101] [101] So, instead of MDCTing (A, B), we now have MDCTs (WA, WRB) with all the multiplications performed in an elementary way. When this is IMDCTed and is multiplied (in an elementary way) by the window function, the last N half becomes:
[0102] [102] WR · (WRB + (WRB) R) = WR · (WRB + WBR) = W2R B + WWRBR
[0103] [103] (Note that we no longer have the multiplication by 1/2, because the normalization of IMDCT differs by a factor of 2 in the windowed case.)
[0104] [104] Similarly, the MDCT and IMDCT in a window of (B, C)
[0105] [105] results in its first half N:
[0106] [106] W. (WB −WR BR) = W2B −WWR BR
[0107] [107] When these two halves are added to each other, the original data is recovered.
[0108] [108] In an identical procedure, the next frame is calculated using parts 205b, 206a, 206b and the first part of the frame side by side in Fig. 2a. Therefore, windows 200, 202, 204 correspond to the window function which has a first overlap length of the three windows with the different overlap lengths used by the controllable window manager 102 of Fig. 1a. As stated, Fig. 2a illustrates a situation in which no transients are detected in the last frame, in the current frame and in the next frame and, specifically, in the anticipation zone for each frame indicated by item 207 for the last frame, 208 for the current frame and 209 for the next frame. Fig. 2b illustrates a situation, where transients are detected at transient positions 210, 211, 212, 213. Because a transient position is, for example, detected at 210, and since 210 is in the zone of anticipation starting at 207 for the last frame, controller 108 determines that a change must be made from the first window 201 to another window 215. Due to the other transients 211, and particularly 212/213 that are in the next zone of anticipation, the current frame is further processed using the second window 216 with the second overlap length. Therefore, window 215 is a kind of a start window that changes from the window with the first overlap length indicated in 201 to the second window that has the second overlap length. As illustrated, the second overlap length extends only eight slots and is therefore only half the length of the first overlap length. Because in the anticipation zone that starts in 209, no more transients are detected, it moves back to the long window 201 through a kind of “stop window 217”. Again, note that the overlap length shown at 218 in the current frame on the one hand and between the current frame and the next frame on the other hand, which is indicated at 218, is half the length of the overlap length in Fig 2a for the first window which is 16 illustrated slots.
[0109] [109] Therefore, the half-over window is used for transients that are detected in detection zones 1 and 6. As shown in 219, this detection zone comprises two slots. Therefore, the anticipation strip is preferably separated into eight slots. On the other hand, however, a higher or more precise subdivision can be made. However, in preferred models, the anticipation zone is subdivided into at least four grooves and preferably subdivided into eight grooves as shown in 2b and 2c and other figures.
[0110] [110] As illustrated, second window 216 has half overlap on both sides, while window 215 has half overlap on the right side and has full overlap on the left side, and window 217 has half overlap on the left side. and the total overlap on the right side.
[0111] [111] Reference is made to Fig. 2c. Fig. 2c illustrates a situation, in which the transient detector detects in the anticipation zone that begins in the center of the last frame that there is a transient in the second transient detection zone 222. Therefore, it moves to an overlapping room to ensure that transient 223 is just “spread out” within window 224, without being included in the zone defined by window 201 or in the zone defined by window 225. In addition, a sequence is indicated, in which a change of a quarter is made from overlap for the last frame and the current frame to the half overlap between the current frame and the next frame and back to the total overlap between the next frame and the side-by-side frame. This is due to the detected transients. In the anticipation zone starting at 208, transients are detected in part one and part six, while transients are detected in part two and part five between the last frame 207 and the current frame 208.
[0112] [112] Therefore, Fig. 2c illustrates a sequence of windows, where the first window 201 is shown, which has the first or all overlap length, where a second window is used which has the second overlap length indicated in 218, where the second window can, for example, be window 225 or window 226, and where a third window with a third overlap length is illustrated as window 224 or window 225 that has the small overlap length 229 on its left side Therefore, a sequence of windows is illustrated that changes from a total overlap to a quarter overlap and then to half overlap and then to a total overlap. Therefore, the first window with the first overlap length can be an asymmetric window with a different overlap from the first overlap on one side and with the first overlap length on the other side. Alternatively, however, the first window can also be a window with the first length of overlap on both sides as shown in 216 in Fig. 2b. In addition, the second window with the second overlap length can be a symmetrical window with the second overlap length on both sides or it can be an asymmetric window with the second overlap length on one side and, on the other side, with the first overlap length or the third overlap length or any other overlap length. Finally, the third window with the third overlap length can be a symmetrical window with the third overlap length on both sides or it can be a window with the third overlap length on one side and with a different overlap length on the other side.
[0113] [113] Subsequently, other models are illustrated in relation to the following figures. In general, the detection of the transient and its location can be done for example using a method or procedure similar to the transient detector described in U.S. Patent 6,826,525 B2, but any other transient detectors can also be used.
[0114] [114] The transient detection unit identifies the presence and, if applicable, the location of the beginning of the strongest transient in the new part of the signal of a certain frame, that is, excluding the overlap zone between the current frame and the previous frame. The resolution of the index that describes the location of the transient is, in the following figures, 1/8 of the length of the frame, therefore, the index ranges from 0 to 7. In subsequent figures, the sub-blocks with indexes 0,…, 7 represent the most recent 20 m of a time domain signal that are used for encoding the current frame.
[0115] [115] Figures 3a-3c illustrate the selection of the transformation overlap width for an exemplary m transformation length, that is, for a TCX20 transformation length.
[0116] [116] In Fig. 3a there is no transient in the current frame. Therefore, a total overlap 300 is detected.
[0117] [117] Fig. 3b, on the other hand, illustrates a situation in which a transient is detected in the seventh sub-block, in order to select a half overlap 302 by controller 108 of Fig. 1a. In addition, Fig. 3c illustrates the situation in which a transient is detected in the sixth sub-block and, therefore, a minimal overlap304 is defined by the controller. Therefore, the transient location detector 106 detects whether a transient exists and, if not, the overlap width or the first overlap width 300 is selected. When, however, there is a transient in the seventh sub-block as determined by the transient location detector 106 of Fig. 1a, the second overlap length 302, which is preferably half of the first overlap length 300, is defined by the controller and when the transient is in sub-block 6, a minimum overlap is defined . Fig. 3c additionally shows the situation in which, despite the fact that the transient is detected at location 6 or 7, the transformation length remains nonetheless. Therefore, the transformation lengths of windows 301a, 301b or 303a or 303b are identical and equal to the first window which has the longest overlap length illustrated in Fig. 3a in 301a and 301b. As can be seen below, it is preferable not only to control the overlap length, but additionally to control the transformation length specifically in situations where the transient is detected in other sub-blocks. Therefore, the overlap width between the current and the next transformation window depends on the location of the transient. The overlap between the current transformation window and the previous one, however, was determined when processing the previous frame.
[0118] [118] Subsequently, reference is made to Fig. 4a to 4g to show the selection of the transformation overlap length for the transformation length of 10 m, that is, TCX10. If, for example, a codec is limited to 10 m of transformation length, the overlap between two TCX10 windows is chosen in order to strongly suppress pseudo-transients due to the TNX modeling of the time distortion of the coding error. In addition, transient persistence is minimized for more than five previous sub-blocks and more than five subsequent sub-blocks. That is, the pre-echo and post-echo are limited to 12.5 ms. The choice of overlap is based on the position of the transient.
[0119] [119] Fig. 4a illustrates a situation where a transient is detected in the 0 th or 1 st sub-block. Then, the “first windows” 401, 402 which have the maximum overlap length or the first 403 are chosen. In addition, for illustrative purposes, a total overlap of TCX20 is illustrated with the previous window and the next window with the reference 404. Therefore, the “total overlap” corresponds to 50% of the 401, 402 window or 33% of the TCX20 window 301a, 301b, for example. Therefore, the overlapping lengths 300 in Fig. 3a and 403 in Fig. 4a are identical.
[0120] [120] Fig. 4b illustrates a situation where the transient is detected in the second sub-block and the controller then controls the window sequence, in order to choose a minimum overlap 404 corresponding to the “third overlap length” illustrated in 229 of Fig. 2c. Therefore, windows 406, 407 which are, in this model, asymmetric windows, are selected having the short overlap length corresponding to the “second window” in the language of Fig. 1a and 1b. In addition, when the transient is detected in the third sub-block, the second overlap length 405 is selected. Therefore, windows 408, 409 correspond to the third window that has the third overlap length 405, but are asymmetric windows.
[0121] [121] In addition, as shown in Fig. 4d, the total overlap length is determined when the transient is in the transient part 4 and, therefore, the windows selected in this situation are the windows 401, 402 shown in Fig. 4a . When choosing the overlay so that one of the overlay transformations contains a transient as illustrated, the case where the transient is in the second or third sub-block is illustrated in Fig. 4f or 4g respectively. The cases where the transient is in sub-block zero or the first sub-block are then treated separately, just as in cases where the transient is in the fourth or fifth sub-block. Therefore, reference is made to Fig. 4e which illustrates the situation where the transient is in subblock zero, obtaining a sequence of windows as shown in Fig. 4e, where there is a half overlap 405 and which is then changed back to total overlap 403. This is achieved by the sequence of windows formed by the initial window 408, and the stop window 409 and another window of normal length 402.
[0122] [122] Fig. 4f, on the other hand, illustrates the situation where the transient is in the first sub-block, in order to select a short or a third overlap length 404, which is made possible by the initial window 406 and the window stop 407 which is then followed by a full overlay window 402. Therefore, window 408 or 409 in Fig. 4e illustrates the second window that has a second overlap length 405 and window 406 and 407 corresponds to the third window that has the third overlap length 404 ”.
[0123] [123] Fig. 4g illustrates a situation where the transient is detected in the fourth sub-block. This situation is reflected by a first window 401 which has a total overlap length 403 and a second window 409 which has a half overlap length 405 and another second window 414 which has a second overlap length 405. The right side of the window 414, however, depends on the overlap length determined for the next frame, that is, the next anticipation zone that starts at the moment indicated by reference number 415.
[0124] [124] Therefore, Figures 4a-4g illustrate the situation in which the overlap length is determined so that the transient is located only within a window that is guaranteed by the fact that, in the location of the transient, for example in sub-block 4, the coefficients of window 414 are equal to 0 and the coefficients of window 409 are equal to 1.
[0125] [125] Subsequently, reference is made to a preferred model, in which the transformation length is derived from the overlap length. Figures 5a, 5b, 5c illustrate three different overlap lengths 403, 405, 404, in which the total overlap length is determined by the first two windows indicated at 501 and 502. In addition, the half overlap length is obtained by two second windows, whose second overlap length is illustrated in 503 and 504, and the third overlap length 404 is obtained by two third windows 505 and 506, whose third overlap length is 404. The total overlap is preferably coded using a " 0 ", the half overlap is coded using a combination of the" 11 "bit and the minimum overlap is coded using the combination of the" 10 "bit.
[0126] [126] Therefore, this encoding is useful when determining the overlap width and overlap length selection when using TCX-20 and a combination of TCX-5 and TCX-10 frames.
[0127] [127] Unlike the coding schemes that derive instant inter-transformation overlays from a given selection of transformation lengths for a pair of frames, that is, the overlap width follows the output of determining the transformation length , a preferred model of the present invention relates to an encoding system that can control or derive the transformation length (s) to be used for a particular frame using the overlap width assigned to that frame and optionally the width overlapping of a previous frame, that is, the transformation length follows the data of the unit for determining the overlapping width or, in relation to Fig. 1a, through the cooperation of the transient location detector106 and the controller 108. Fig. 6a illustrates a coding frame and Fig. 6b illustrates a corresponding decision frame. In Figures 5a, 5b and 5c, the continuous line represents the right half of the window of the last transformation in the current frame and the discontinuous line represents the left half of the window of the first transformation in the next frame.
[0128] [128] Fig. 6a illustrates encoding of the overlap and transformation length based on the position of the transient. In particular, the short / long transformation decision is encoded using 1 bit as indicated in column 600 and the overlap with the first window of the next frame is encoded using the variable length code with 1 or 2 bits as illustrated in column 602. O code for the short / long transformation decision 600, on the one hand, and the binary code for the overlap width of column 602 are concatenated to obtain the so-called overlap code in column 603. In addition, the overlap with the first window of the next frame is determined by controller 108 depending on the transient position index of column 604 as determined by transient detector 106. Contrary to previous illustrations, the transient position index has a greater anticipation range starting at the two previous slots indicated by -1 and -2 and for this situation, in addition, the total overlap is signaled in this model.
[0129] [129] Therefore, the total overlap is signaled for “no transient” or a transient position between -2 and 1. In addition, a half overlap is signaled by column 605 for transient positions 2 and 3 and 7 and the overlap minimum is signaled for transient positions 4, 5, 6.
[0130] [130] Therefore, the index “-2” in Fig. 6a means that there was a transient in the previous frame at position 6, and “-1” means that there was a transient in the previous frame at position 7. As stated, “None” means that no transients have been detected in the transient anticipation zone.
[0131] [131] As outlined, the short / long transformation decision and the overlap width are coded together using the overlap code. The overlay code consists of 1 bit for a short / long transformation decision and the binary code for the overlap width encoded with 1 or 2 bits. The code is a variable length code, where it is automatically detected where a code name starts and where the previous code name ends. The codes for the short / long transformation decision and for the overlap width are defined in Fig. 6a. For example, when the short / long transformation decision provides 1 and the minimum overlap is selected, that is, a binary code is equal to 10, the overlap code is 110.
[0132] [132] In addition, Fig. 6a illustrates the situation in which a short transformation decision is made for all positions of the transient between -2 and 5 and where a long transformation is chosen for no transients or for the transient in position 6 or 7. Therefore, Fig. 6a illustrates the situation in which the transient location detector can detect a certain transient in a certain position and in which, independently of one another or in parallel, the short transformation decision can be determined. / long and the overlay with the first window of the next frame, that is, it is possible to derive the code from the total overlay 603. It is emphasized that professionals in the field understand that any other codes can be used to encode different short / long transformations and different overlays. In addition, more than two, i.e., three or even more, transformation lengths can be determined and flagged, and at the same time more than three overlaps can be determined and encoded, such as four or five different overlap lengths. All of this is determined, for example, in response to a transient location detector that operates in at least four different divisions per frame or, as in the model, that operates in eight divisions per frame or, for a more precise decision, that operates in even more rooms, such as sixteen rooms in a frame.
[0133] [133] Based on the overlap code for the current frame and the previous frame, a decision is made for a combination of the transformation length to use, as shown in Fig. 6b. Therefore, Fig. 6b illustrates the decision for a transformation length based on the previous overlap code and the current overlap code. For example, if the previous overlay code and the current overlay code are both “00”, a window such as 401 is used. If the previous overlay code was 10 and the current overlay code is 00, the same window is selected . However, if the previous code is 111, that is, a half overlap code, and the current overlap code is 00, window 409 of Fig. 4c, for example, is selected. For a previous overlapping code of 110 and the current overlapping code 00, a long transformation is selected again, but with a window identical to window 407, and the situation is the same for the previous overlapping code of 010 and the code for overlapping. current overlap of 00, that is, window 407 of Fig. 4f is selected. Finally, for a previous overlay code 011, and for the current overlay code 00, a window like 409 is selected in Fig. 4e.
[0134] [134] Other windows are selected for other combinations and this is specifically illustrated in relation to Fig. 7. Therefore, Fig. 7 illustrates some of the combinations of the transformation length together with the position of the transient in the current frame and with the codes of overlap for the current frame and the previous frame. 110/010 - 111 in Fig. 7 means that the previous overlap code is 110 or 010 and that the current overlap code is 111. Fig. 7 therefore illustrates different combinations. For example, the image on the upper left side of Fig. 7 illustrates a minimal overlap at the beginning of a sequence of two TCX-5 transformations and a following TCX-10 transformation that has the total overlap. Contrary to this, the image below this image illustrates a minimal overlap followed by four TCX-5 windows, where the fourth window of the TCX-5 windows has half overlap and so on. Accordingly, reference numbers 700, 701 illustrate a sequence of two TCX-5 windows or two short windows followed by an average window. Similarly, reference numbers 702, 703, 704, 705, 706, 707 illustrate a situation with four short transformation lengths or “TCX-5” transformations, while reference numbers 708, 709, 710, 711 illustrate the situation in which, for the first time, that is, at the beginning of the sequence, there is an average transformation length window, such as a TXC 10 window followed by two TCX-5 or short transformation length windows. The sequences 700 to 711 in Fig. 7 can be introduced by other sequences like these or by TCX-20 or long transformation length windows that have different overlays, such as short overlays in 700, 702 for example, an average overlap in 704 or long overlays at 708 or 710, for example. At the same time, the sequence can be followed by other such sequences or it can be followed by TCX-20, that is, by long transformation windows, but with a different overlap length. Therefore, sequence 700, for example, ends with a long overlap and sequence 702, for example, ends with a medium overlap or sequence 706, for example, ends with a short overlap length.
[0135] [135] As shown in Fig. 1a, the window information, that is, the overlay code 603 of Fig. 6a shown in 112 in Fig. 1a, can be associated with each frame encoded by an output interface 114.
[0136] [136] In addition, the transformation applied to converter 104 can be an MDCT or MDST or a different distortion-introducing transformation that is characterized by the fact that the number of spectral values in a spectral value block is less than the number of samples windowed in a windowed sample block introduced in the transformation or, relative to the decoder side where the number of time domain output samples is greater than the number of spectral values introduced in this return or inverse distortion transformation.
[0137] [137] As shown in all Figures 2 to 7, a constant frame grid is maintained. Thus, controller 108 ensures that, despite moving to shorter transformation lengths, as shown in Fig. 7, the same frame grid is always maintained. This is guaranteed by the exclusive use of these specific windows which always result in a similar transformation length for each class of windows in the context of the correct overlap size. Therefore, each TCX-5 transformation length is defined to have this overlap zone and a constant zone between the two overlap zones that the transformation results in N / 4 spectral values, where N is the number of spectral values within a frame. The shape and size and specifically the overlapping lengths of the TCX 20 transformation windows are additionally designed so that this window results in N spectral samples N after transformation.
[0138] [138] Fig. 1c illustrates a privileged implementation on the decoder side of controllable converter 158. In particular, controllable converter 158 comprises a frequency-time converter 170, a subsequently connected synthesis window manager 172 and a final overlay adder 174. Specifically, the frequency-time converter performs the transformation such as a DCT-IV transformation and a subsequent unfolding operation, so that the output of the frequency-time converter 170 has, for a first or long window, 2N samples while the introduction in the frequency-time converter it was, for example, N spectral values. On the other hand, when the input to the frequency-time converter is N / 8 spectral values, the output is N / 4 time domain values for an MDCT operation, as an example.
[0139] [139] Then, the output of the frequency-time converter 170 is introduced in a synthesis window manager, which applies the synthesis window which is preferably exactly the same as the one on the encoder side. Therefore, each sample is, before an overlay is added, in a window by means of two windows, so that the resulting “total window management” is a square of the corresponding window coefficients so that the Princen-Bradley condition is fulfilled according to discussed above.
[0140] [140] Finally, the overlay added 174 makes the corresponding correct overlay addition to finally obtain the decoded audio signal at output 175. In particular, the frequency-time converter 170, the synthesis window manager 172 and the adder overlay 174 are controllable and are controlled, for example, by overlay code 603 discussed in the context of Fig. 6a or any other information relating to the situation discussed in the context of Fig. 6b. However, the corresponding transformation length for the frequency-time convert is preferably determined, based on the previous overlapping code and the current overlapping code using the transformation length decision table. In addition, the size / shape of the window is also determined based on the previous overlay code and a current overlay code, and the same is true for the overlay added, so that the overlay adder applies the maximum overlay , the medium overlap or the minimum overlap as flagged.
[0141] [141] Therefore, it is preferable for controller 180 in the decoder in Fig. 1c to receive the overlap codes, that is, the previous overlap code 606 and the current overlap code 607, and determine, from this information, the overlap. and the window for the spectral value block.
[0142] [142] Therefore, each window and the corresponding transformation size associated with the window are determined. In preferred models where an MDCT is used as a transformation and an inverse MDCT is used for the reverse transformation, the window size is twice the transformation length or the transformation length is half the window size.
[0143] [143] Fig. 1d illustrates another model of the present invention implemented with a mobile device, wherein the mobile device comprises, on the one hand, an encoder 195 and, on the other hand, a decoder 196. Furthermore, according to a preferred model of the present invention, both encoder 105 and decoder 106 retrieve the same window information from only a single memory 197, since the windows used in encoder 195 and the windows used in decoder 196 are identical to each other . Thus, the decoder has a read only memory 197 or a random access memory or generally any memory 197 that stores only a single set of window or window sequences for use in both the encoder and the decoder. This is advantageous in that the different window coefficients for the different windows do not have to be saved twice, with one set for the encoder and one set for the decoder. Instead, because according to the present invention identical windows and window sequences are used in the encoder and decoder, only a set of window coefficients has to be saved. Therefore, the memory usage of the device of the invention illustrated in Fig. 1d is substantially reduced in relation to a different concept, in which the encoder and decoder have different windows or in which a certain post-processing is performed with a processing that is not is that of window management operations.
[0144] [144] Subsequently, reference was made to another privileged model in relation to the transformation / transformation length change model.
[0145] [145] The adaptive transformation and overlap length encoding scheme outlined above was implemented in the encoded transformation excitation (TCX) path of the LD-USAC encoder, a low-delay variant of xHE-AAC [5] with a length of 20 m frame, and tested at 48 kbit / s mono. At this configuration point, LD-USAC operates in TCX mode only with a core core length of 512 samples and a long transformation overlay of 256 samples, ie 33%, during stationary (pseudo) input conditions. The encoder includes a transient detection unit, the output of which is introduced into a unit for determining the transformation length and the unit for determining the inventive overlap width. Three transformation lengths are available to code: a TCX-20 length with 512 MDCT coefficients, a TCX-10 length with 256 MDCT coefficients and a special TCX-5 length with 128 MDCT coefficients. Correspondingly, one of three overlap widths can be used and transmitted per frame: maximum overlap of 256 core samples (10 m), half overlap of 128 core samples (5 m) and minimum overlap of 16 samples (0.6 m ). For each frame, the transformation lengths have to be selected, so that the sum of the lengths of all transformations in it is equal to the length of the core frame, that is, 512 samples.
[0146] 1. A unidade de detecção de transientes identifica a presença e, se se aplicar, a localização do início do transiente mais forte na nova parte do sinal de uma certa frame (isto é, excluindo a zona de sobreposição entre a frame atual e a frame anterior). A resolução do índice que descreve a localização do transiente é 1/8 do comprimento da frame, portanto, a faixa do índice é 0, ..., 7. 2. Se não for detectado nenhum transiente ou se o índice de localização do transiente for 6 ou 7, a frame afetada é codificada usando a transformação TCX-20 por decisão da unidade de determinação do comprimento de transformação. Caso contrário, é utilizada uma combinação de transformações TCX-10 e/ou TCX-5: ou 2x TCX-10 ou 4x TCX-5 ou TCX-10 seguida de 2x TCX 5 ou 2x TCX-5 seguida de TCX-10. 3. A unidade de determinação da largura de sobreposição controla agora as formas de sobreposição das transformações usadas dentro da frame atual (excluindo a sobreposição já escolhida com a última frame) de acordo com os objetivos enumerados acima, de tal modo que são selecionadas as sobreposições mais longas possíveis que não violem esses objetivos. Em particular, se uma frame for TCX-20 e o índice de localização do transiente for 6 ou 7, a unidade de sobreposição devolve a sobreposição mínima ou a meia sobreposição, respetivamente. Se não estiver presente nenhum sinal não estacionário em uma frame, é utilizada a sobreposição máxima. 4. Além disso, se uma combinação TCX-10/-5 foi devolvida pela unidade de determinação do comprimento de transformação para a frame (nãoestacionária), a unidade de determinação da largura de sobreposição controla a composição exata dos comprimentos de transformação nessa frame. Particularmente, se for utilizada a sobreposição máxima anteriormente, assim como, a frame atual, aplica-se 2x TCX-5 seguido de TCX-10 na frame atual, sendo a primeira das transformações TCX-5 a transformação de transição da invenção com sobreposição dupla. Se a largura de sobreposição da última frame ou da frame atual for inferior ao máximo, é também utilizada uma das configurações TCX-10/-5 misturadas. Se tanto a última como a atual frame tiverem uma sobreposição inferior ao mínimo, utiliza-se 4x TCX-5. 5. O codificador procede agora à gestão de janelas do sinal e das atuais MDCTs para a frame. É preciso ter cuidado especial com a ordem das operações de gestão de janelas na presença da janela de transição de sobreposição dupla da invenção para obter uma perfeita reconstrução depois da descodificação. O remanescente do processo de codificação é idêntico ao de xHE-AAC. TNS é opcionalmente aplicado às transformações individuais e pode ser feito o agrupamento de dois conjuntos de coeficientes TCX-5 MDCT em um conjunto de coeficientes tipo TCX-10 (intercalado) para guardar a informação lateral. Para cada frame, é transmitido ao descodificador um valor da largura de sobreposição, assim como, uma bandeira de 1-bit que indica a codificação TCX-20 ou não TCX20. [146] In a privileged model of the coding system of the invention, the encoder operates as follows: 1. The transient detection unit identifies the presence and, if applicable, the location of the beginning of the strongest transient in the new part of the signal of a certain frame (that is, excluding the overlap zone between the current frame and the frame previous). The index resolution that describes the location of the transient is 1/8 of the frame length, so the index range is 0, ..., 7. 2. If no transients are detected or if the transient location index is 6 or 7, the affected frame is encoded using the TCX-20 transformation by decision of the transformation length determination unit. Otherwise, a combination of TCX-10 and / or TCX-5 transformations is used: either 2x TCX-10 or 4x TCX-5 or TCX-10 followed by 2x TCX 5 or 2x TCX-5 followed by TCX-10. 3. The unit for determining the overlap width now controls the forms of overlap of the transformations used within the current frame (excluding the overlap already chosen with the last frame) according to the objectives listed above, in such a way that overlays are selected possible that do not violate these objectives. In particular, if a frame is TCX-20 and the transient location index is 6 or 7, the overlap unit returns the minimum overlap or half overlap, respectively. If no non-stationary signal is present in a frame, maximum overlap is used. 4. In addition, if a TCX-10 / -5 combination was returned by the transformation length determination unit for the frame (non-stationary), the overlap width determination unit controls the exact composition of the transformation lengths in that frame. Particularly, if the maximum overlap was used previously, as well as the current frame, 2x TCX-5 is applied followed by TCX-10 in the current frame, the first of the TCX-5 transformations being the transition transformation of the invention with double overlap . If the overlap width of the last frame or the current frame is less than the maximum, one of the mixed TCX-10 / -5 configurations is also used. If both the last and the current frame have an overlap below the minimum, 4x TCX-5 is used. 5. The encoder now manages the signal windows and the current MDCTs for the frame. Special care must be taken with the order of the window management operations in the presence of the double overlay transition window of the invention to obtain a perfect reconstruction after decoding. The remainder of the coding process is identical to that of xHE-AAC. TNS is optionally applied to individual transformations and two sets of TCX-5 MDCT coefficients can be grouped into a set of coefficients type TCX-10 (interleaved) to store the lateral information. For each frame, a value of the overlap width is transmitted to the decoder, as well as a 1-bit flag that indicates the TCX-20 or non-TCX20 encoding.
[0147] [147] Like the encoder, the appropriate decoder according to the preferred model features an overlap width determination unit that interprets the overlap width values transmitted to control the length and window management of the inverse MDCTs, so that the encoder and decoder are completely synchronized with the transformations used. As with the encoder, the order of window management and folding operations after individual MDCTs is central to achieving perfect signal reconstruction.
[0148] [148] Subsequently, another model of the invention is discussed and illustrated in the context of Figures 8 to 15f. This aspect, which is also called the "multiple overlapping aspect" can be combined with the model of changing the overlapping width and transformation length that was discussed in relation to Figures 1 to 7, or can be implemented separately from this aspect. .
[0149] [149] One side of the encoder of the invention is illustrated in Fig. 8a and one side of the decoder is illustrated in Fig. 8b. In particular, the apparatus for creating an encoded signal or the encoder shown in Fig. 8a comprises a window sequence controller to create window sequence information 809 routed, for example, to the preprocessor 802, a spectrum converter 804 or an output interface 810 as shown in Fig. 8a. The window sequence information indicates a first window function to create a first spectral value frame, a second window function and one or more third window functions to create a second spectral value frame. The first function of the window, the second function of the window and the one or third functions of the window overlap within a multiple overlap zone.
[0150] [150] This multiple overlap zone is, for example, illustrated in 1300 in Fig. 13 or Fig. 14b or Fig. 15e or Fig. 15f. Therefore, in this multiple overlap zone 1300, at least three functions of the window, that is, the first function of the window relative to Fig. 15f illustrated in 1500, the second function of the window 1502 and the third function of the window 1503, overlap. to each other within the multiple overlap zone 1300. There may also be a higher overlap, such as an overlap of four, five or even more windows. Alternatively, Fig. 15e illustrates the situation where once again there is a first function of window 1500, the second function of window 1502 but now four third functions of window 1503 as opposed to a single third function of window 1503 of Fig. 15f.
[0151] [151] A preprocessor 102 is provided to correctly handle this multiple overlap zone which results in a significant reduction in the delay required for the transient anticipation zone. The preprocessor is configured to manage the windows of a second sample block corresponding to the second window and to the third or third window functions using an auxiliary window function to obtain a second window sample block. In addition, the preprocessor is configured to preprocess the second sample block in windows using an operation of folding a part of the second block that overlaps with the first block in the multiple overlapping part to obtain a second preprocessed block window samples with a modified multiple overlay part. In addition, an 804 spectrum converter is configured to apply a distortion-introducing transformation to the first sample block using the first window to obtain the first frame of spectral values. In addition, the spectrum converter is configured to apply a distortion-introducing transformation to a first part of the second pre-processed block of the window samples, using the second window function to obtain a first part of spectral samples from a second frame. , and to apply the distortion introduction transformation to a second part of the second pre-processed block of the window samples, using one or more third window functions to obtain a second part of spectral samples from the second frame. In addition, a processor 806 indicated as a “coding processor” is provided within the encoder of Fig. 8a to process the first frame and the second frame of spectral values to obtain encoded frames of the audio signal at output 807 of block 806. Thus that is, the encoding processor may be identical to or different from the encoder processor 110 of Fig. 1a and may perform any of the known MPEG or AMR or any other encoding characteristic in the art.
[0152] [152] Then reference is made to Fig. 13. Fig. 13 shows again the second half of the first function in window 1500, the second function in window 1502 and, in the second image in Fig. 13, two third functions of window 1503. Contrary to this, the top illustration in Fig. 13 shows once again a first function of window 1500, a second function of window 1502 and, contrary to, for example, Fig. 15f and slightly similar to Fig. 15e, four third functions of the window 1503. Alternatively, the number of third functions of the window can also be three, five or more.
[0153] [153] Furthermore, Fig. 13 further illustrates a situation with a different first function for window 1500 ', a different second function for window 1502' and the same third function for window 1503. The difference between 1500 and 1500 'is that the overlap length of the functions 1500 'and 1502' is half that of the windows 1500, 1502. Therefore, the situation of the functions of the window 1500 'and 1502' is that the length of the overlap is half overlap illustrated in 218, for example in Fig. 2d, while the total overlap length corresponds to a complete frame as, for example, shown in 203 in Fig. 2a or Fig. 13. Therefore, the functions of window 1500 'and 1502' illustrated in this figure represent a combination of the multiple overlapping aspect and the aspect of determining the overlapping width.
[0154] [154] To better explain the 802 preprocessor procedure on the encoder side, reference is made to the illustration in Fig. 11a on the one hand, and the flowcharts in Fig. 9a, 9b on the other hand. Regarding the decoder, reference is made to the corresponding illustrations in Fig. 8b, Figures 10a, 10b and the illustration in Fig. 11b. In addition, the encoder is also shown in Fig. 12a and the decoder is shown in Fig. 12b.
[0155] [155] In particular, Fig. 11a illustrates once again the first function of window 1500 and at least part of the second function of window 1502 and either four third functions of window 1503 or a single third function of window 1503. In particular, Fig. 11a further illustrates the auxiliary function of window 1100. The auxiliary function of window 1100 has a first part 1100a that matches the first rising part 1500a of the first function of window 1500. In addition, the auxiliary function of window 1100 has a the second non-overlapping part 1100b preferably has window coefficients equal to the unit and a third part 1100c corresponding to a falling or falling right part of the or the third window functions. Therefore, the auxiliary function of window 1100 covers the second half of the previous frame illustrated in 1102, the first half of the current frame i being indicated by 1103, the second half of the current frame i being indicated by 1104 and the first small part 1105 covered by part of the auxiliary function of the 1100c window. As shown in Fig. 11a, the auxiliary function of the window is treated as an "initial window sequence" or corresponds to that "initial window sequence", as if in the frame i + 1 a sequence of short windows It is important, however, that a sequence of short windows has already been introduced in the current frame instead of the incoming frame i + 1.
[0156] [156] The functionality of the preprocessor is then illustrated in Fig. 11a. The preprocessor preprocesses the second block of window samples obtained by the window management using the auxiliary window function that uses a vote on the operation indicated as "initial bend distortion, frame i". Therefore, the leftmost part of the second window sample block indicated by 1110 is folded inwards. This part 1110 is the part of the second block of window samples that overlap with the first previous function of window 1500, that is, the part of the second block of window samples corresponding to time period 1102 and which is located in the previous frame. i - 1. Because this part 1110 folding operation now influences overlapping zone 1300, the folding operation performed by the preprocessor results in a modified multiple overlap part. The spectrum converter now applies the operation illustrated in the line in Fig. 11a indicated as “internal bending distortions”. In particular, the spectrum converter applies a distortion-introducing transformation to the first sample block using the first function of the window shown for frame i - 1. The distortion-introducing transformation comprises the folding operation illustrated in 1120 and the subsequent , for example, DCT-IV transformation indicated in 1122. To that end, the first function of window 1500 is required to obtain the shape before the folding operation 1120 for frame i - 1. In addition, the spectrum converter applied to the transformation of introducing distortion to the first part indicated by item 1131 in Fig. 11a. This is done using the second function of the window 1502 and, in particular, the right part of the second function of the window 1502. This operation results in a first part of spectral samples of a second frame obtained by transformation 1132, where transformation 1132 plus it once represents a DCT-IV operation which constitutes, together with the corresponding folding operation, but now only in the overlapping part on the right side of block 1131, the distortion-introducing transformation.
[0157] [157] In addition, the spectrum converter is configured to apply the distortion-introducing transformation to a second part 1133 of the second preprocessed block 1130 using the third or third functions of window 1503 to obtain a second part 1135 of the spectral samples of the second frame. Therefore, to obtain the second part 1135 of spectral samples, four N / 8 DCT-IV transformations or a single N / 2 DCT-IV transformation can be applied. The number or transformations and lengths depend on the number of the third functions of the window. In general, the length, transformation or number of spectral samples in the second part 1135 is equal to the number of spectral samples in a frame minus the length of transformation 1132 and the result is then divided by the number of the third window functions used .
[0158] [158] Therefore, the preprocessor 802 is generally operative to manage windows 902 (Fig. 9a), where the audio signal uses the auxiliary function of window 1100 to obtain the second block of window samples. Then, processor 904 preferably applies the following folding operation indicated at 1110 in Fig. 11a to obtain the second pre-processed block of samples in a window with the modified multiple overlap part 1300. Then, converter 906 applies the transformations using the first, second and third functions of the window to obtain the first frame of spectral values1122, the first part 1132 of the second frame and the second part 1135 of the second frame or frame i in the notation of Fig. 11a.
[0159] [159] In the privileged model, illustrated in relation to Fig. 9b, the auxiliary function of the window is determined 910 by referring to the first function of the window, and by way of example, selecting, as the first part 1100a of the auxiliary function of the window 1100, the first 1500a part of the first window function. In addition, the non-overlapping part 1100b is determined (the window coefficients of one are considered for the corresponding length) and the third part 1100c is then determined, again by way of example considering the second part of the short window function.
[0160] [160] Then, the audio signal is managed in window 912 with this auxiliary function of the window in the correct relation with the previous or first frame i - 1 illustrated in Fig. 11a. Then, as shown in 914 in Fig. 9b, the left part 1110 and preferably the right part 1111 are folded. In step 916, the overlapping parts are left over, this being illustrated in dotted lines in item e) or f) in the inner area. In addition, as shown in 918, if there are more third functions of the window as in Fig. 11a sub-image e), the overlapping parts of the third window functions are also folded. However, if there is only a third window function as shown in Fig. 11a, sub-image f), control proceeds from step 916 to 920 directly without step 918. In step 920, DCT operations are performed using DCT cores shorter than the DCT core for the first frame. The DCT core for sub-image e) is N / 2 for the second window function, and N / 8 for the third window function. Conversely, when there is only a single third function of the window, the transformation kernel is equal to N / 2 for the second function of the window and is equal to N / 2 for the only third function of the window.
[0161] [161] Therefore, it is clear that the multiple overlap zone 1300 is windowed twice. The first window management is performed by the first part 1100a of the auxiliary window and the second window management is performed by the second half of the first third function of the window 1503 as shown in sub-image e) or f) of Fig. 11a.
[0162] [162] Reference is made again to Fig. 13. As discussed in the context of Fig. 1a or in the context of Fig. 8a, the window sequence controller creates the specific window shapes. In one model, the window sequence controller is configured to comprise the transient location detector 106. When a transient is detected in transient detection parts 0 or 1, the encoder is controlled to go into the overlap part mode. multiple, so that these transients indicated in 1305 are confined only to the single third window or to two adjacent third windows. Specifically, the left transient 1305 is confined only to the first function of the short window, where the right transient of transients 1305 is in the first to the third functions of the window. However, when it is determined that the transients are located in a zone other than 0, such as in zone 1, 2, 3 or so, processing can be carried out without the multiple overlap zone, for example, identical to what was discussed in the context of Fig. 6a, Fig. 6b, Fig. 7 or so.
[0163] [163] Contrary to this, however, the processing of the multiple overlap zone can also be carried out in the context of the application of the window change, in which, when a transient is detected, an even larger set of short windows can be changed for the current frame, so that, preferably within one and the same block or grid of the frame, a long window or a specified number of short windows is used to manage the windows. The first window corresponds to window 1500, for example in Fig. 13, the second window corresponds to window 1502 and a change is made, without referring to a certain location of the transient, for a number of the third functions of the window only when it is detected a transient anywhere in the current frame without knowing exactly where within the frame the transient is.
[0164] [164] However, it is preferable, in order to keep the number of third functions of the window as small as possible, that the change to the mode of the multiple overlay part and the additional change of the selection of the transformation overlay and the transformation length are performed according to the specific location of the transient within the frame, that is, in one or preferably four or even eight different parts of a frame or a part of the time corresponding to a frame, where this part of the time is then equal to half the size of a long window, such as the long window 1500 in Fig. 13. Preferably, the multiple overlapping part is, as seen in Fig. 13, located before a start 208 (shown in Fig. 2 on the one hand and in Fig. 13 on the other hand) of the anticipation zone.
[0165] [165] Analog processing is carried out on the decoder side. In one model, an apparatus is required to decode an encoded audio signal 821, which comprises a first encoded frame and a second encoded frame, a decoding processor 824 of Fig. 8b, to process the first encoded frame and the second encoded frame for obtain a first frame of spectral values and a second frame of spectral values, comprising the first and second frames parts of distortion. A time converter 826 is connected to the decoded processor 824 and the time converter 826 is configured to apply a transformation to this first frame using a first window function to obtain a first sample block. In addition, the time converter 826 is configured to apply the transformation to a first part of the second frame using a second window function, and to apply the transformation to a second part of the second frame using one or more third window functions to obtain the second block of samples. As discussed in the context of Fig. 1a, the first function of window 1500, the second function of window 1502 and the one or third functions of window 1503 together have a multiple overlap zone 1300.
[0166] [166] In addition, the decoder comprises a post-processor 828 to post-process the second sample block using an unfold operation to obtain a second post-processed sample block with a portion of the second sample block overlapping. to the first sample block in the multiple overlap zone. In addition, post-processor 828 is configured to manage the windows of the second post-processed block of samples using the auxiliary function of the window discussed in the context of Fig. 8a and Fig. 11a. The post-processor 828 performs an overlay addition of the second post-processed block in the samples window and the first sample block to obtain the decoded audio signal indicated in 829 of Fig. 8b or in block 175 of Fig. 1c. Therefore, basically the postprocessor 828 of Fig. 8b may have the functionality of the synthesis window manager 172 in relation to the auxiliary function of the window and the overlay adder 174.
[0167] [167] Subsequently the functionality of the post-processor in collaboration with the time converter is discussed in relation to the illustration in Fig. 11b, which illustrates a reverse processing in relation to the illustration in the encoder in Fig. 11a. The first frame of spectral values 1142 is introduced in an inverse transformation of size N 1161 and the first part 1152 of the second frame is introduced in an inverse transformation N / 2 1162 and depending on the number of the third functions of the window, the second part 1155 of the second frame is introduced in four N / 8 short transformations 1163 or in a single N / 2 transformation 1162 identical to the process of the first part 1152 of the second frame.
[0168] [168] This procedure is performed by the time converter. The time converter additionally uses the first window function to perform the window management in conjunction with a preview unfold operation illustrated in 1170 in Fig. 11b. In addition, the second window function is used when applying the procedures to the first part 1152 illustrated in 1172. Specifically, the rightmost part 1173 of the second window function is specifically unfolded and the second later window management is performed , while, on the left side of the frame, no internal unfolding is carried out. In addition, the transformation performs a specific unfolding and subsequent window management and an additional addition of overlap not only with the first part 1152 of the second frame, but also with the second part 1155 of the second frame as illustrated in 1172 in Fig. 11b . If there is only a single third function of the window illustrated in sub-image f) in Fig. 11b, only a single unfolding operation is carried out on both sides together with the window management, using the right part of the second window function. and the left side of the third window function and the subsequent addition of overlay within the overlay range 1174.
[0169] [169] The post-processor then applies post-processing using the unfolding operation illustrated in 1175 with the first part of the result of the procedure in 1172 to obtain a part 1176a that extends in the previous frame and preferably 1176b that extends in the next frame. Then, the window management is performed with the unfolded part 1176a, 1176b and of course with the part inside the current frame i using the auxiliary function of the window to obtain the state illustrated in 1175. Then, a final overlay addition is made. the auxiliary function of the window of the second post-processed sample block and the first sample block in and within the overlay range 1180 to obtain the final decoded audio signal corresponding to this overlay range 1180. In addition, this procedure additionally results in a subsequent part of the 1181 decoded audio signal samples because there is no overlap and the next section 1182 is obtained by overlapping with the corresponding part of a window function for frame i + 1, following frame i in time.
[0170] [170] Therefore, as illustrated in Fig. 10a, the decoder side method comprises applying 1000 a transformation to the first frame using the first window function and applying 1010 the transformation to the first part of the second frame using the second window function and apply the transformation 1020 to the second part of the second frame using the third function (s) of the window. Then, in step 1030 an unfold operation is performed and in step 1040 window management is performed using the auxiliary window function, in step 1050 an overlapping of the second and the first post-processed block in window is performed for obtaining the decoded audio signal at the end of the processing illustrated, for example, in Fig. 11b.
[0171] [171] As shown in Fig. 10b, the privileged models include performing an inverse DCT operation for each part of the second frame, that is, performing several DCT operations with shorter lengths compared to the previous frame i - 1, where a window was used long 1500. In step 1070 the internal distortion parts are unfolded as the operation illustrated in 1172 and the unfolding is preferably a mirror at the corresponding limit illustrated as vertical lines on the line indicated by 1172 in Fig. 11b. Then, in step 1080, window management is performed using the second and third window functions within block 1184 and the subsequent addition of the window management result within the block is made as illustrated in 1090. Next, as indicated in 192, the left / right or, in other words, anterior / posterior distortion parts of the overlay addition result are unfolded to obtain the parts 1176a extending in the previous frame and the part 1176b extending in the following frame. However, the representation in 1175 is only after the window management, using the auxiliary function of the window illustrated in 1094. Then, in step 1906, an overlay is added with the first block of samples after the window management using the auxiliary window function.
[0172] [172] Subsequently, reference is made to Fig. 12a and Fig. 12b. Item a in Fig. 12a corresponds to the procedure in the first line of Fig. 11a. The procedure in sub-image b) corresponds to the procedure performed in the second and third lines of Fig. 11a and the procedures illustrated in item c) in Fig. 12a correspond to the procedures in the last two lines of Fig. 11a. Similarly, the representation of the decoder side corresponds to Fig. 12b. In particular, the first two lines of Fig. 11b correspond to sub-image f) in Fig. 12b. The third and fourth lines correspond to item e) in Fig. 12b, and the last line in Fig. 12b corresponds to the last line in Fig. 11b.
[0173] [173] Fig. 14a illustrates a situation where the window sequence controller on the encoder side or elements 824, 826, 828 on the decoder side are configured to switch between a non-multiple overlap situation as in Fig. 14a and a multiple overlap situation illustrated in Fig. 14b. Therefore, when a transient is detected in the transient part 0, a procedure is not to apply the multiple overlay part but to switch to short single overlay windows TCX-10 from the TCX-20 windows. Preferably, however, one moves to a multiple overlapping part by applying a sequence of windows comprising the first window 1400, the second window 1402 and one or, in the model of Fig. 14b, two third windows 1403.
[0174] [174] The overlays and window sizes in Fig. 14b are slightly different from the illustration in Fig. 13, but it is clear that the general procedures regarding the encoder side in Fig. 11a or the decoder side in Fig. 11b occur in the same way.
[0175] [175] Afterwards, Fig. 15. Specifically, Fig. 15 illustrates, like the black boxes, an anticipation of the detection of transient 1590 and the duration of the resulting pre-echo 1595. Fig. 15a illustrates a traditional sequence High Efficiency type AAC that comprises a long start window, eight short windows, a long stop window and so on. The anticipation required is high and reaches N + N / 2 + N / 16, but the pre-echo 1595 is small. Similarly, Fig. 15b illustrates a traditional procedure for detecting the low-delay AAC type transient resulting in a window sequence comprising a long sequence, a long start window, a low overlap window and a long stop window. The anticipation of transient detection is the same as in Fig. 15a, but the duration of the pre-echo is longer than in Fig. 15a. On the other hand, however, the efficiency is higher because the more short windows are used, the lower the bit rate efficiency.
[0176] [176] Fig. 15c and 15d illustrate an implementation of the High Efficiency AAC or a low delay AAC procedure with a reduced anticipation of N / 16 sample transients detection and only the longest possible sequences with reduced anticipation of the transient detection of N / 16 samples. If the sequence consists of a long window, a long window, a long start window, a long stop window, and so on, as shown in Fig. 15d, only the post-echo is reduced compared to Fig. 15c, but the pre-echo 1595 is the same. Therefore, Fig. 15c, d, illustrates a short anticipation identical to Figures 15e and 15f of the invention. If we now implement the part of the multiple overlay as in Figures 15c and 15e, only sequences like those in these figures could be used, but no change to a short window would be possible. Therefore, the multiple overlay part allows you to switch to short windows to reduce the pre-echo and post-echo, or to use a short anticipation delay or both to reduce the delay and to reduce the pre-echo and the post-dry.
[0177] [177] Fig. 15e illustrates a High Efficiency AAC sequence with reduced anticipation of the transient detection of N / 16 samples and the preferred multiple overlap zone 1300. The sequence comprises a long window, another long window 1500, another sequence initial 1502, four short sequences 1503 and a long stop window 1504. As is clear, the anticipation is small, as is the pre-echo. A similar situation is obtained for Fig. 15f illustrating a configuration similar to that in Fig. 15e, but with only a single third window function instead of four short sequences.
[0178] [178] Although the present invention has been described in the context of block diagrams, where the blocks represent current or logical hardware components, the present invention can also be implemented by a computer-implemented method. In the latter case, the blocks represent corresponding steps of the method, where these steps are for the functionalities performed by corresponding logical or physical hardware blocks.
[0179] [179] Although some aspects have been described in the context of an apparatus, it is clear that these aspects also represent a description of the corresponding method, in which a block or device corresponds to a method step or to a characteristic of a method step . Similarly, the aspects described in the context of a method step also represent a description of a corresponding block or item or characteristic of a corresponding apparatus. Some or all of the method steps can be performed (or used) by a hardware device, such as a microprocessor, a programmable computer or an electronic circuit. In some versions, one or more of the most important steps in the method can be performed by such a device.
[0180] [180] The transmitted or encoded signal of the invention can be stored on a digital storage medium or can be transmitted on a transmission medium, such as a wireless transmission medium or a wired transmission medium, such as the Internet.
[0181] [181] Depending on certain implementation requirements, the models of the invention can be implemented in hardware or in software. The implementation can be carried out using a digital storage medium, for example a floppy disk, a DVD, a Blu-Ray, a CD, a ROM, a PROM, an EPROM, an EEPROM or a FLASH memory, with read control signals electronics stored there, that cooperate (or are able to cooperate) with a programmable computer system, so that the respective method is executed. Therefore, the digital storage medium can be read on a computer.
[0182] [182] Some versions according to the invention comprise a data carrier with electronic readout control signals, which are able to cooperate with a programmable computer system, so that one of the methods described here is performed.
[0183] [183] Generally speaking, the models of the present invention can be implemented as a computer program product with a program code, the program code being operative to execute one of the methods when the computer program product runs on a computer. The program code can, for example, be stored on a machine-readable medium.
[0184] [184] Other models include the computer program for executing one of the methods described here, stored on a machine-readable medium.
[0185] [185] In other words, a model of the method of the invention is, therefore, a computer program with a program code for executing one of the methods described here, when the computer program runs on a computer.
[0186] [186] Another model of the method of the invention is, therefore, a data medium (or a non-transitory storage medium, such as a digital storage medium, or a computer reading medium) comprising, recorded there, the program computer to perform one of the methods described here. The data medium, the digital storage medium or the recorded medium are typically tangible and / or non-transitory.
[0187] [187] Another model of the method of the invention is, therefore, a data stream or a sequence of signals representing the computer program to execute one of the methods described here. The data stream or signal sequence can, for example, be configured to be transferred via a data communication link, for example via the Internet.
[0188] [188] Another model comprises a processing medium, for example, a computer, or a programmable logic device, configured or adapted to perform one of the methods described here.
[0189] [189] Another model comprises a computer with the computer program installed to perform one of the methods described here.
[0190] [190] Another version according to the invention comprises an apparatus or system configured to transfer (for example, electronically or optically) a computer program to perform one of the methods described herein to a receiver. The receiver can, for example, be a computer, a mobile device, a memory device or the like. The apparatus or system may, for example, comprise a file server for transferring the computer program to the receiver.
[0191] [191] On some models, a programmable logic device (for example a network of programmable logic gates) can perform some or all of the functionality of the methods described here. On some models, a network of programmable logic gates can cooperate with a microprocessor. to perform one of the methods described here. In general, the methods are preferably performed by any hardware device.
[0192] [192] The models described above are merely illustrative for the principles of the present invention. It is understood that the changes and variations in the provisions and details described will be evident to professionals in the field. It is, therefore, intended to be limited only by the scope of the patent's impending claims and not by the specific details of the description and explanation of the models contained herein.
[0193] [1] Organização Internacional da Padronização, ISO/IEC 14496-3 2009, "Tecnologia da Informação – Codificação de objetos de áudio-visuais – Parte 3 Áudio," Genebra, Suíça, Ago. 20096. [2] Grupo de Missão de Engenharia de Internet (IETF), RFC 6716, "Definição do Codec de Áudio Opus," Padrão Proposto, Set. 2012. Disponível online em http://tools.ietf.org/html/rfc6716. [3] C. R. Helmrich, "Sobre o Uso de Somas de Senos nas Janelas de Sinais," em Proc. Da 13.ª Conferência Int. Sobre Efeitos de Áudio Digitais (DAFx-10), Graz, Áustria, Set. 2010. [4] J. Herre e J. D. Johnston, "Melhorar o Desempenho dos Codificadores de Áudio Percetuais Usando Modulação de Ruído Temporal (TNS)," em Proc. 101.ª Convenção AES, LA, EUA, Nov. 1996 [5] M. Neuendorf et al., "MPEG Discurso Unificado e Codificação de Áudio – O Padrão ISO/MPEG para a Codificação de Áudio de Alta Eficiência de Todos os Tipos de Conteúdos," em Proc 132.ª Convenção de AES, Budapeste, Hungria, Abr. 2012. Também para aparecer no Diário do AES, 2013. [193] References [1] International Organization for Standardization, ISO / IEC 14496-3 2009, "Information Technology - Coding of audio-visual objects - Part 3 Audio," Geneva, Switzerland, Aug. 20096. [2] Internet Engineering Mission Group (IETF), RFC 6716, "Opus Audio Codec Definition," Proposed Standard, Sept. 2012. Available online at http://tools.ietf.org/html/rfc6716. [3] CR Helmrich, "On the Use of Sense Sums in Signal Windows," in Proc. From the 13th Int. Conference on Digital Audio Effects (DAFx-10), Graz, Austria, Sept. 2010. [4] J. Herre and JD Johnston, "Improving the Performance of Percentage Audio Encoders Using Temporal Noise Modulation (TNS)," in Proc. 101st AES Convention, LA, USA, Nov. 1996 [5] M. Neuendorf et al., "MPEG Unified Discourse and Audio Coding - The ISO / MPEG Standard for High Efficiency Audio Coding of All Types of Content," in Proc 132. AES Convention, Budapest , Hungary, Apr. 2012. Also to appear in the Diário do AES, 2013.
权利要求:
Claims (33)
[0001]
Apparatus for creating an audio or image signal encoded in the presence of transients, characterized by comprising: a window sequence controller (808) to create window sequence information (809) to manage the windows of an audio or image signal, the window sequence information indicating a first window function (1500) to create a first frame of spectral values, a second window function (1502) and at least a third window function (1503) to create a second frame of spectral values with a first and a second part, where the first window function ( 1500), the second window function (1502) and the one or more third window functions overlap within a multiple overlap zone (1300); a preprocessor (802) for managing the windows (902) of a second block of samples corresponding to the second window function and one or more third window function (s), using an auxiliary window function (1100) to obtain a second block of window samples, and to preprocess (904) the second window sample block, using an operation of folding a part of the second block that overlaps with a first block on the multiple overlapping part (1300) to obtain a second pre-processed block of samples in a window with a modified multiple overlap part; a spectrum converter (804) to apply a distortion-introducing transformation (906) to the first block of samples using the first window function (1500) to obtain the first frame of spectral values, to apply another distortion-introducing transformation on a first part of the second pre-processed block of window samples using the second window function (1502) to obtain a first part of the spectral values of the second frame, and to apply another or other distortion introducing transformations to a second part the second pre-processed block of the window samples, using one or more third window functions (1503) to obtain a second part of the second frame's spectral values; and a processor (806) for processing the first frame and the second frame to obtain encoded frames of the audio or image signal.
[0002]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to claim 1, characterized in that the second window function (1502) has a first part (1100a) that overlaps with the first window function (1500), wherein one or more third functions of the window (1503) have a second part (1111) that overlaps with a fourth function of the window following one or more third function (s) of the window (1503), and wherein the preprocessor (802) is configured to apply the auxiliary window function (1100), the auxiliary window function having a first part (1100a) identical to the first part of the second window function and has a third part (1100c) ) identical to the second part of one or more third functions of the window, where a second part of the auxiliary function of the window extends between the first part and the third part.
[0003]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to claim 2, characterized in that the auxiliary function of the window has the second part (1100b) corresponding to a second part of one or more third functions of the window (1503), or where the second part (1100b) has window coefficients greater than 0.9 or because it is a unit, or wherein the length of the second part is such that the second pre-processed block of window samples results in a number of spectral values identical to the number of spectral values in the first frame.
[0004]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 1 to 3, characterized in that the window sequence controller (808) is configured to create the window sequence information (809), such that the second window function (1502) or the third window function (1503) has a size or duration less than the size or duration of the first window function (1500).
[0005]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 1 to 4, characterized in that the preprocessor (802) is configured to use, as the auxiliary function of the window, an initial function of the window (1100) such that a number of spectral values derived by transforming the second sample block into a window to obtain the second frame is equal to a number of spectral values of the first frame.
[0006]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 1 to 5, characterized in that the spectrum converter (804) is configured to manage the windows of the first block of samples using the first window function to obtain a first block of samples in a window and to apply the distortion introduction transformation in the first block of samples in window.
[0007]
Apparatus for creating an audio or image signal encoded in the presence of transients, according to any one of claims 1 to 6, characterized in that the spectrum converter (804) is configured to manage the windows of the first part of the second pre-processed block using a second part of the second window function, where a first part of the second window function is not used to manage windows , and to apply the distortion introducing transformation to a first part in windows of the second pre-processed block.
[0008]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 1 to 7, characterized in that the spectrum converter (804) is configured to manage the windows of the second part of the second pre-processed block using one or more third functions of the window, except a second part of the third function of the window or a second part of a third function of the later window in terms of time or space.
[0009]
Apparatus for creating an audio or image signal encoded in the presence of transients, according to any one of claims 1 to 8, characterized in that the preprocessor (802) is configured to perform, in the folding operation, an inversion of time or space of the part and the heavy addition of an inverted part of the time or space for a part, for which the part of the second block has been folded.
[0010]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 1 to 9, characterized in that the preprocessor (802) is configured to additionally use another operation to fold a part of the second block that overlaps with the fourth window function following one or more third window functions in time or space to obtain the second pre-processed block of the window samples.
[0011]
Apparatus for creating an audio or image signal encoded in the presence of transients, according to any one of claims 1 to 10, characterized in that the spectrum converter (804) is configured to perform a modified discrete cosine transformation operation (MDCT) or a modified discrete sine transformation operation (MDST).
[0012]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 1 to 11, characterized in that the spectrum converter (804) is configured to perform the MDCT or MDST operation by applying a folding operation to reduce a number of samples and a subsequent discrete cosine or discrete sine transformation operation on the reduced number of samples .
[0013]
Apparatus for creating an audio or image signal encoded in the presence of transients, according to any one of claims 1 to 12, characterized in that the window sequence controller (808) comprises a transient detector (106) for detecting a location of the transient in an anticipation zone of the first frame, and in which the window sequence controller (808) is configured to create the window sequence information (809) in response to a detection of a transient location in the anticipation zone or in a specific part of the anticipation zone, and where the window sequence controller (808) is configured to create other sequence information indicating a sequence of the first overlap windows, when the transient is not detected in the anticipation zone or is detected in a part of the anticipation zone other than the specific part.
[0014]
Apparatus for creating an audio or image signal encoded in the presence of transients, according to any one of claims 1 to 13, characterized in that the specific part is a quarter of a start from the center of the current frame.
[0015]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 13 or 14, characterized by the multiple overlapping part being, in time and space, before the beginning of the anticipation zone, or in a part of the anticipation zone, in the first frame.
[0016]
Apparatus for creating an encoded audio or image signal in the presence of transients, according to any one of claims 13 or 14, characterized in that the window sequence controller (808) is configured to select a specific window from a group of at least three windows depending on the location of the transient (107), the group of at least three windows comprises a first window (201) with a first length of overlap (203), a second window (225) with a second length of overlap (218) and a third window (222) with a third length of overlap (229) or without overlap, wherein the first length of overlap is greater than the second overlap length and where the second overlap length is greater than the third overlap length or greater than a zero overlap, where the specific window is selected based on the location of the transient, so so that one of two consecutive overlap windows has first window coefficients at the location of the transient and the other of the two consecutive overlap windows s has second window coefficients at the location of the transient, where the second window coefficients are at least nine times greater than the first window coefficients.
[0017]
Apparatus for decoding an encoded audio or image signal in the presence of transients, comprising a first encoded frame and a second encoded frame, characterized by comprising: a processor (824) for processing the first encoded frame and the second encoded frame to obtain a first frame of spectral values and a second frame of spectral values, the first and second frames comprising a distortion part; a time converter (826) to apply a transformation to the first frame of spectral values using a first function of the window (1500) to obtain a first block of samples, to apply another transformation to a first part of the second frame of spectral values using a second window function (1502), and to apply another or more transformations to a second part of the second spectral value frame using one or more third window functions (1503) to obtain a second sample block, wherein the first function of the window (1500), the second function of the window (1502) and the third function of the window form a multiple overlap zone (1630); and a post-processor (828) for post-processing the second sample block using an unfold operation to obtain a second post-processed sample block with a portion of the second sample block overlapping the first sample block in the overlap zone multiple, to manage the windows of the second post-processed block of samples using an auxiliary function of the window (1100), and to add the overlay of the second post-processed block in the samples window and the first block of samples to obtain a decoded audio or image signal (1180).
[0018]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to claim 17, characterized in that the application of the transformation comprises performing an overlapping addition (1172) of a first part of the second sample block and a second part of the second sample block to obtain the second sample block.
[0019]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to claim 18, characterized in that the unfolding operation comprises mirroring samples with respect to a limit of the second sample block.
[0020]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to claims 17 to 19, characterized in that the time converter (826) is configured to use exactly a third window function (1503) and a length of the third window function is such that a number of spectral values equal to 50% of the number of spectral values of the first frame is transformed and a result is displayed in a window through the third window function, or in which the time converter is configured to use exactly two third windows and a length of the third window is such that a number of spectral values equal to 1/8 of the number of spectral values of the first frame is transformed, or where the time converter is configured to use exactly a third window and the length of the third window is such that a number of spectral values equal to 1/4 of the number of spectral values of the first frame is transformed, or to use exactly four third windows and the length of a third window is such that a number of spectral values is equal to 1/8 d the number of spectral values of the first frame.
[0021]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 20, characterized in that the encoded audio or image signal comprises a window indication (603) associated with the first and second encoded frames, wherein the apparatus further comprises an interface (820) for extracting and analyzing the window indication; and wherein the time converter or post-processor (828) is configured to be controlled by the window indication to apply a window shape or a specified window length or transformation length.
[0022]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 21, characterized in that the second window function (1502) has a first part (1100a) that overlaps with the first window function (1500), where the third window function or third has a second part (1111) that overlaps with a fourth window function following one or more third window functions (1503), and where the post processor is configured to apply the window auxiliary function (1100), the window auxiliary function having a first part (1100a ) identical to the first part of the second window function, and having a third part (1100c) identical to the second part of one or more third window functions, in which the second part of the window auxiliary function extends between the first part and the third part.
[0023]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 22, characterized in that the auxiliary function of the window has the second part (1100b) corresponding to a second part of one or more third functions of the window (1503), or in which the second part (1100b) has window coefficients greater than 0.9 or that they are units, or where the length of the second part is such that the second pre-processed block of window samples results in a number of spectral values identical to the number of spectral values in the first frame.
[0024]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 23, characterized in that the window sequence information (809) is such that the second window function (1502) or one or more third window functions (1503) has a size or duration less than the size or duration of the first window function (1500 ).
[0025]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 23, characterized in that the post-processor is configured to use, as the auxiliary function of the window, an initial function of the window (1100) such that a number of spectral values derived by transforming the second block of samples into a window to obtain the second frame is the same to a number of spectral values from the first frame.
[0026]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 24, characterized in that the time converter is configured to add overlays of the first part of the second sample block and a second part of the second sample block using a second part of the second window function, where a first part of the second window function is not used.
[0027]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 26, characterized in that the time converter is configured to perform overlapping addition of the first part of the second sample block using one or more third window functions, except for a second part of a third window function or a second part of a third function of the window later window in terms of time or space.
[0028]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 27, characterized in that the post-processor is configured to additionally use another folding operation of a part of the second block that overlaps with the fourth window function following one or more third window functions in time or space.
[0029]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 28, characterized in that the time converter is configured to apply the transformation using a reverse DCT or reverse DST operation and a subsequent unfold operation.
[0030]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 29, characterized in that the time converter is configured to apply the transformation so that the transient of the decoded audio or image signal is located in time or space after the multiple overlap zone or is located in a part of time or space not covered by the second window function.
[0031]
Apparatus for decoding an encoded audio or image signal in the presence of transients, according to any one of claims 17 to 30, characterized in that the first part of the second frame comprises n / 2 spectral values and the second part of the second frame comprises four blocks with n / 8 spectral values or a single block with n / 2 spectral values or two blocks for spectral values.
[0032]
Method for creating an audio or image signal encoded in the presence of transients, characterized by comprising: create (808) a window sequence information (809) to manage the windows of an audio or image signal, the window sequence information indicating a first window function (1500) to create a first frame of spectral values, a second function of the window (1502) and at least a third function of the window (1503) to create a second frame of spectral values with a first and a second part, where the first function of the window (1500), the second function of the window (1502) and the one or more third window functions overlap within a multiple overlap zone (1300); managing the windows (902) of a second sample block corresponding to the second window function and one or more third window functions using an auxiliary window function (1100) to obtain a second window sample block, pre-process (904) the second block of window samples using a fold operation of a part of the second block that overlaps with the first block in the multiple overlap part (1300) to obtain a second pre-processed block of samples in window with a modified multiple overlay part; apply (804) a distortion-introducing transformation (906) to the first sample block using the first window function (1500) to obtain the first frame of spectral values, applying another distortion-introducing transformation to a first part of the second block pre-processed window samples using the second window function (1502) to obtain a first part of the spectral values of the second frame, and to apply another or more distortion-introducing transformations to a second part of the second pre-processed block of the window samples, using one or more third window functions (1503) to obtain a second part of spectral samples from the second frame; and processing (806) the first frame and the second frame to obtain encoded frames of the audio or image signal.
[0033]
Method for decoding an audio or image signal encoded in the presence of transients, comprising a first encoded frame and a second encoded frame, characterized by comprising: processing (824) the first encoded frame and the second encoded frame to obtain a first frame of spectral values and a second frame of spectral values, the first and second frames comprising a distortion part; apply (826) a transformation in the first spectral value frame using a first window function (1500) to obtain a first block of samples, applying another transformation to a first part of the second spectral value frame using a second window function (1502 ), and applying another or more transformations to a second part of the second spectral value frame using one or more third window functions (1503) to obtain a second sample block, wherein the first function of the window (1500), the second function of the window (1502) and the third function of the window form a multiple overlap zone (1630); and post-processing (828) the second sample block using an unfold operation to obtain a second post-processed sample block with a portion of the second sample block overlapping the first sample block in the multiple overlap zone, managing the windows of the second post-processed block of samples using an auxiliary function of the window (1100), and adding the overlay of the second post-processed block in the sample window and the first block of samples to obtain a decoded audio or image signal (1180).
类似技术:
公开号 | 公开日 | 专利标题
BR112015019270B1|2021-02-17|apparatus and method for creating an encoded audio or image signal in the presence of transients, apparatus and method for decoding an encoded audio or image signal in the presence of transients
CN107077854B|2021-06-22|Processor, method and computer program for processing an audio signal using truncated analysis or synthesis window overlap portions
同族专利:
公开号 | 公开日
US20170323650A1|2017-11-09|
RU2015139597A|2017-03-27|
SG11201506542QA|2015-09-29|
HK1219343A1|2017-03-31|
RU2626666C2|2017-07-31|
PL2959481T3|2017-10-31|
KR101764725B1|2017-08-03|
HK1218988A1|2017-03-17|
TW201443878A|2014-11-16|
TW201447868A|2014-12-16|
EP2959481A1|2015-12-30|
EP2959481B1|2017-04-26|
US10832694B2|2020-11-10|
PT2959482T|2019-08-02|
AR094845A1|2015-09-02|
US9947329B2|2018-04-17|
BR112015019270A8|2019-11-12|
AU2014220725B2|2016-11-17|
JP6175148B2|2017-08-02|
RU2015139596A|2017-03-27|
WO2014128194A1|2014-08-28|
CN110047498A|2019-07-23|
AU2014220722A1|2015-10-08|
CN105074819A|2015-11-18|
AU2014220722B2|2016-09-15|
US20200294517A1|2020-09-17|
BR112015019270A2|2017-07-18|
CN105378835A|2016-03-02|
CA2901186A1|2014-08-28|
JP2016507788A|2016-03-10|
ES2736309T3|2019-12-27|
AR096576A1|2016-01-20|
CN110232929A|2019-09-13|
US20190371346A1|2019-12-05|
AU2014220725A1|2015-10-08|
MX348505B|2017-06-14|
ES2634621T3|2017-09-28|
CN105074819B|2019-06-04|
CA2901186C|2018-02-20|
MY185210A|2021-04-30|
EP2959482A1|2015-12-30|
CA2900437C|2020-07-21|
KR20150120477A|2015-10-27|
KR20150126864A|2015-11-13|
US20210065725A1|2021-03-04|
JP6196324B2|2017-09-13|
WO2014128197A1|2014-08-28|
US10685662B2|2020-06-16|
TWI550599B|2016-09-21|
SG11201506543WA|2015-09-29|
MX348506B|2017-06-14|
EP3525207A1|2019-08-14|
KR101764726B1|2017-08-14|
BR112015019543B1|2022-01-11|
MX2015010595A|2015-12-16|
BR112015019543A2|2017-07-18|
PL2959482T3|2019-10-31|
RU2625560C2|2017-07-14|
JP2016513283A|2016-05-12|
TWI550600B|2016-09-21|
CA2900437A1|2014-08-28|
US20160078875A1|2016-03-17|
CN110097889A|2019-08-06|
CN105378835B|2019-10-01|
US10354662B2|2019-07-16|
MX2015010596A|2015-12-16|
TR201910956T4|2019-08-21|
PT2959481T|2017-07-13|
EP2959482B1|2019-05-01|
US20160050420A1|2016-02-18|
引用文献:
公开号 | 申请日 | 公开日 | 申请人 | 专利标题

US4920426A|1986-11-10|1990-04-24|Kokusai Denshin Denwa Co., Ltd.|Image coding system coding digital image signals by forming a histogram of a coefficient signal sequence to estimate an amount of information|
DE3902948A1|1989-02-01|1990-08-09|Telefunken Fernseh & Rundfunk|METHOD FOR TRANSMITTING A SIGNAL|
DE59002222D1|1989-10-06|1993-09-09|Telefunken Fernseh & Rundfunk|METHOD FOR TRANSMITTING A SIGNAL.|
US5502789A|1990-03-07|1996-03-26|Sony Corporation|Apparatus for encoding digital data with reduction of perceptible noise|
CN1062963C|1990-04-12|2001-03-07|多尔拜实验特许公司|Adaptive-block-lenght, adaptive-transform, and adaptive-window transform coder, decoder, and encoder/decoder for high-quality audio|
JP3186307B2|1993-03-09|2001-07-11|ソニー株式会社|Compressed data recording apparatus and method|
US5701389A|1995-01-31|1997-12-23|Lucent Technologies, Inc.|Window switching based on interblock and intrablock frequency band energy|
KR0154387B1|1995-04-01|1998-11-16|김주용|Digital audio encoder applying multivoice system|
JP3552811B2|1995-09-29|2004-08-11|三菱電機株式会社|Digital video signal encoding device and decoding device|
US5848391A|1996-07-11|1998-12-08|Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V.|Method subband of coding and decoding audio signals using variable length windows|
US6131084A|1997-03-14|2000-10-10|Digital Voice Systems, Inc.|Dual subframe quantization of spectral magnitudes|
DE19736669C1|1997-08-22|1998-10-22|Fraunhofer Ges Forschung|Beat detection method for time discrete audio signal|
JP2000000247A|1998-06-15|2000-01-07|Yoshihiro Adachi|Ultra-lug screw|
US6173255B1|1998-08-18|2001-01-09|Lockheed Martin Corporation|Synchronized overlap add voice processing using windows and one bit correlators|
DE10000934C1|2000-01-12|2001-09-27|Fraunhofer Ges Forschung|Device and method for determining an encoding block pattern of a decoded signal|
JP2002118517A|2000-07-31|2002-04-19|Sony Corp|Apparatus and method for orthogonal transformation, apparatus and method for inverse orthogonal transformation, apparatus and method for transformation encoding as well as apparatus and method for decoding|
JP4596197B2|2000-08-02|2010-12-08|ソニー株式会社|Digital signal processing method, learning method and apparatus, and program storage medium|
FR2822980B1|2001-03-29|2003-07-04|Ela Medical Sa|METHOD FOR PROCESSING ELECTOGRAM DATA OF AN ACTIVE IMPLANTABLE MEDICAL DEVICE FOR ASSISTANCE TO DIAGNOSIS BY A PRACTITIONER|
DE60225130T2|2001-05-10|2009-02-26|Dolby Laboratories Licensing Corp., San Francisco|IMPROVED TRANSIENT PERFORMANCE FOR LOW-BITRATE CODERS THROUGH SUPPRESSION OF THE PREVIOUS NOISE|
US7460993B2|2001-12-14|2008-12-02|Microsoft Corporation|Adaptive window-size selection in transform coding|
EP1394772A1|2002-08-28|2004-03-03|Deutsche Thomson-Brandt Gmbh|Signaling of window switchings in a MPEG layer 3 audio data stream|
US7876966B2|2003-03-11|2011-01-25|Spyder Navigations L.L.C.|Switching between coding schemes|
US7325023B2|2003-09-29|2008-01-29|Sony Corporation|Method of making a window type decision based on MDCT data in audio encoding|
DE10345996A1|2003-10-02|2005-04-28|Fraunhofer Ges Forschung|Apparatus and method for processing at least two input values|
KR20070001185A|2004-03-17|2007-01-03|코닌클리케 필립스 일렉트로닉스 엔.브이.|Audio coding|
US7937271B2|2004-09-17|2011-05-03|Digital Rise Technology Co., Ltd.|Audio decoding using variable-length codebook application ranges|
US7630902B2|2004-09-17|2009-12-08|Digital Rise Technology Co., Ltd.|Apparatus and methods for digital audio coding using codebook application ranges|
CN101061533B|2004-10-26|2011-05-18|松下电器产业株式会社|Sound encoding device and sound encoding method|
KR100668319B1|2004-12-07|2007-01-12|삼성전자주식회사|Method and apparatus for transforming an audio signal and method and apparatus for encoding adaptive for an audio signal, method and apparatus for inverse-transforming an audio signal and method and apparatus for decoding adaptive for an audio signal|
US7386445B2|2005-01-18|2008-06-10|Nokia Corporation|Compensation of transient effects in transform coding|
RU2409874C9|2005-11-04|2011-05-20|Нокиа Корпорейшн|Audio signal compression|
US7987089B2|2006-07-31|2011-07-26|Qualcomm Incorporated|Systems and methods for modifying a zero pad region of a windowed frame of an audio signal|
US8744862B2|2006-08-18|2014-06-03|Digital Rise Technology Co., Ltd.|Window selection based on transient detection and location to provide variable time resolution in processing frame-based data|
DE102006051673A1|2006-11-02|2008-05-15|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for reworking spectral values and encoders and decoders for audio signals|
GB2443832B|2006-11-14|2010-08-18|Schlumberger Holdings|Method and system of deploying one or more optical fiber waveguides in conjunction with a pipeline|
KR20080053739A|2006-12-11|2008-06-16|삼성전자주식회사|Apparatus and method for encoding and decoding by applying to adaptive window size|
KR101016224B1|2006-12-12|2011-02-25|프라운호퍼-게젤샤프트 추르 푀르데룽 데어 안제반텐 포르슝 에 파우|Encoder, decoder and methods for encoding and decoding data segments representing a time-domain data stream|
FR2911227A1|2007-01-05|2008-07-11|France Telecom|Digital audio signal coding/decoding method for telecommunication application, involves applying short and window to code current frame, when event is detected at start of current frame and not detected in current frame, respectively|
FR2911228A1|2007-01-05|2008-07-11|France Telecom|TRANSFORMED CODING USING WINDOW WEATHER WINDOWS.|
RU2459283C2|2007-03-02|2012-08-20|Панасоник Корпорэйшн|Coding device, decoding device and method|
EP2015293A1|2007-06-14|2009-01-14|Deutsche Thomson OHG|Method and apparatus for encoding and decoding an audio signal using adaptively switched temporal resolution in the spectral domain|
CA2697920C|2007-08-27|2018-01-02|Telefonaktiebolaget L M Ericsson |Transient detector and method for supporting encoding of an audio signal|
WO2009119592A1|2008-03-25|2009-10-01|旭化成ケミカルズ株式会社|Elastomer composition and storage cover for airbag system|
US8447591B2|2008-05-30|2013-05-21|Microsoft Corporation|Factorization of overlapping tranforms into two block transforms|
CN101836253B|2008-07-11|2012-06-13|弗劳恩霍夫应用研究促进协会|Apparatus and method for calculating bandwidth extension data using a spectral tilt controlling framing|
EP2144230A1|2008-07-11|2010-01-13|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Low bitrate audio encoding/decoding scheme having cascaded switches|
PT2301011T|2008-07-11|2018-10-26|Fraunhofer Ges Forschung|Method and discriminator for classifying different segments of an audio signal comprising speech and music segments|
CA2871372C|2008-07-11|2016-08-23|Fraunhofer-Gesellschaft Zur Forderung Der Angewandten Forschung E.V.|Audio encoder and decoder for encoding and decoding audio samples|
ES2401487T3|2008-07-11|2013-04-22|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and procedure for encoding / decoding an audio signal using a foreign signal generation switching scheme|
US8380498B2|2008-09-06|2013-02-19|GH Innovation, Inc.|Temporal envelope coding of energy attack signal by using attack point location|
KR101315617B1|2008-11-26|2013-10-08|광운대학교 산학협력단|Unified speech/audio coder processing windows sequence based mode switching|
US8457975B2|2009-01-28|2013-06-04|Fraunhofer-Gesellschaft Zur Foerderung Der Angewandten Forschung E.V.|Audio decoder, audio encoder, methods for decoding and encoding an audio signal and computer program|
ES2374486T3|2009-03-26|2012-02-17|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|DEVICE AND METHOD FOR HANDLING AN AUDIO SIGNAL.|
ES2673637T3|2009-06-23|2018-06-25|Voiceage Corporation|Prospective cancellation of time domain overlap with weighted or original signal domain application|
MX2012004116A|2009-10-08|2012-05-22|Fraunhofer Ges Forschung|Multi-mode audio signal decoder, multi-mode audio signal encoder, methods and computer program using a linear-prediction-coding based noise shaping.|
KR101137652B1|2009-10-14|2012-04-23|광운대학교 산학협력단|Unified speech/audio encoding and decoding apparatus and method for adjusting overlap area of window based on transition|
CN103109318B|2010-07-08|2015-08-05|弗兰霍菲尔运输应用研究公司|Utilize the scrambler of forward direction aliasing technology for eliminating|
CN104718572B|2012-06-04|2018-07-31|三星电子株式会社|Audio coding method and device, audio-frequency decoding method and device and the multimedia device using this method and device|
KR20140075466A|2012-12-11|2014-06-19|삼성전자주식회사|Encoding and decoding method of audio signal, and encoding and decoding apparatus of audio signal|EP2830058A1|2013-07-22|2015-01-28|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Frequency-domain audio coding supporting transform length switching|
EP2980795A1|2014-07-28|2016-02-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoding and decoding using a frequency domain processor, a time domain processor and a cross processor for initialization of the time domain processor|
EP2980794A1|2014-07-28|2016-02-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Audio encoder and decoder using a frequency domain processor and a time domain processor|
EP2980791A1|2014-07-28|2016-02-03|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Processor, method and computer program for processing an audio signal using truncated analysis or synthesis window overlap portions|
FR3024581A1|2014-07-29|2016-02-05|Orange|DETERMINING A CODING BUDGET OF A TRANSITION FRAME LPD / FD|
EP3107096A1|2015-06-16|2016-12-21|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Downscaled decoding|
WO2017050398A1|2015-09-25|2017-03-30|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Encoder, decoder and methods for signal-adaptive switching of the overlap ratio in audio transform coding|
EP3182411A1|2015-12-14|2017-06-21|Fraunhofer-Gesellschaft zur Förderung der angewandten Forschung e.V.|Apparatus and method for processing an encoded audio signal|
JP6976277B2|2016-06-22|2021-12-08|ドルビー・インターナショナル・アーベー|Audio decoders and methods for converting digital audio signals from the first frequency domain to the second frequency domain|
US10210874B2|2017-02-03|2019-02-19|Qualcomm Incorporated|Multi channel coding|
EP3692521A1|2017-10-06|2020-08-12|Sony Europe B.V.|Audio file envelope based on rms power in sequences of sub-windows|
TWI681384B|2018-08-01|2020-01-01|瑞昱半導體股份有限公司|Audio processing method and audio equalizer|
CN113596447A|2019-03-09|2021-11-02|杭州海康威视数字技术股份有限公司|Method, decoding end, encoding end and system for encoding and decoding|
法律状态:
2018-11-13| B06F| Objections, documents and/or translations needed after an examination request according [chapter 6.6 patent gazette]|
2020-06-02| B06U| Preliminary requirement: requests with searches performed by other patent offices: procedure suspended [chapter 6.21 patent gazette]|
2020-12-08| B09A| Decision: intention to grant [chapter 9.1 patent gazette]|
2021-02-17| B16A| Patent or certificate of addition of invention granted [chapter 16.1 patent gazette]|Free format text: PRAZO DE VALIDADE: 20 (VINTE) ANOS CONTADOS A PARTIR DE 20/02/2014, OBSERVADAS AS CONDICOES LEGAIS. |
优先权:
申请号 | 申请日 | 专利标题
US201361767115P| true| 2013-02-20|2013-02-20|
US61/767,115|2013-02-20|
PCT/EP2014/053287|WO2014128194A1|2013-02-20|2014-02-20|Apparatus and method for generating an encoded signal or for decoding an encoded audio signal using a multi overlap portion|
[返回顶部]